smac.epm.gaussian_process module

class smac.epm.gaussian_process.GaussianProcess(configspace: ConfigSpace.configuration_space.ConfigurationSpace, types: List[int], bounds: List[Tuple[float, float]], seed: int, kernel: skopt.learning.gaussian_process.kernels.Kernel, normalize_y: bool = True, n_opt_restarts: int = 10, instance_features: Optional[numpy.ndarray] = None, pca_components: Optional[int] = None)[source]

Bases: smac.epm.base_gp.BaseModel

Gaussian process model.

The GP hyperparameterŝ are obtained by optimizing the marginal log likelihood.

This code is based on the implementation of RoBO:

Klein, A. and Falkner, S. and Mansur, N. and Hutter, F. RoBO: A Flexible and Robust Bayesian Optimization Framework in Python In: NIPS 2017 Bayesian Optimization Workshop

Parameters
  • types (List[int]) – Specifies the number of categorical values of an input dimension where the i-th entry corresponds to the i-th input dimension. Let’s say we have 2 dimension where the first dimension consists of 3 different categorical choices and the second dimension is continuous than we have to pass [3, 0]. Note that we count starting from 0.

  • bounds (List[Tuple[float, float]]) – bounds of input dimensions: (lower, uppper) for continuous dims; (n_cat, np.nan) for categorical dims

  • seed (int) – Model seed.

  • kernel (george kernel object) – Specifies the kernel that is used for all Gaussian Process

  • prior (prior object) – Defines a prior for the hyperparameters of the GP. Make sure that it implements the Prior interface.

  • normalize_y (bool) – Zero mean unit variance normalization of the output values

  • n_opt_restart (int) – Number of restarts for GP hyperparameter optimization

  • instance_features (np.ndarray (I, K)) – Contains the K dimensional instance features of the I different instances

  • pca_components (float) – Number of components to keep when using PCA to reduce dimensionality of instance features. Requires to set n_feats (> pca_dims).

Abstract base class for all Gaussian process models.

_get_gp() → skopt.learning.gaussian_process.gpr.GaussianProcessRegressor[source]
_nll(theta: numpy.ndarray) → Tuple[float, numpy.ndarray][source]

Returns the negative marginal log likelihood (+ the prior) for a hyperparameter configuration theta. (negative because we use scipy minimize for optimization)

Parameters

theta (np.ndarray(H)) – Hyperparameter vector. Note that all hyperparameter are on a log scale.

Returns

lnlikelihood + prior

Return type

float

_optimize() → numpy.ndarray[source]

Optimizes the marginal log likelihood and returns the best found hyperparameter configuration theta.

Returns

theta – Hyperparameter vector that maximizes the marginal log likelihood

Return type

np.ndarray(H)

_predict(X_test: numpy.ndarray, cov_return_type: Optional[str] = 'diagonal_cov') → Tuple[numpy.ndarray, Optional[numpy.ndarray]][source]

Returns the predictive mean and variance of the objective function at the given test points.

Parameters
  • X_test (np.ndarray (N, D)) – Input test points

  • cov_return_type (typing.Optional[str]) – Specifies what to return along with the mean. Refer predict() for more information.

Returns

  • np.array(N,) – predictive mean

  • np.array(N,) or np.array(N, N) or None – predictive variance or standard deviation

_train(X: numpy.ndarray, y: numpy.ndarray, do_optimize: bool = True)smac.epm.gaussian_process.GaussianProcess[source]

Computes the Cholesky decomposition of the covariance of X and estimates the GP hyperparameters by optimizing the marginal loglikelihood. The prior mean of the GP is set to the empirical mean of X.

Parameters
  • X (np.ndarray (N, D)) – Input data points. The dimensionality of X is (N, D), with N as the number of points and D is the number of features.

  • y (np.ndarray (N,)) – The corresponding target values.

  • do_optimize (boolean) – If set to true the hyperparameters are optimized otherwise the default hyperparameters of the kernel are used.

sample_functions(X_test: numpy.ndarray, n_funcs: int = 1) → numpy.ndarray[source]

Samples F function values from the current posterior at the N specified test points.

Parameters
  • X_test (np.ndarray (N, D)) – Input test points

  • n_funcs (int) – Number of function values that are drawn at each test point.

Returns

function_samples – The F function values drawn at the N test points.

Return type

np.array(F, N)