Gp
neps.optimizers.bayesian_optimization.models.gp
#
ComprehensiveGP
#
ComprehensiveGP(
graph_kernels: Iterable,
hp_kernels: Iterable,
likelihood: float = 0.001,
weights=None,
vectorial_features: list = None,
combined_kernel: str = "sum",
logger=None,
surrogate_model_fit_args: dict = None,
)
Source code in neps/optimizers/bayesian_optimization/models/gp.py
dmu_dphi
#
Compute the derivative of the GP posterior mean at the specified input location with respect to the vector embedding of the graph (e.g., if using WL-subtree, this function computes the gradient wrt each subtree pattern)
The derivative is given by $ \frac{\partial \mu^}{\partial \phi ^} = \frac{\partial K(\phi, \phi^)}{\partial \phi ^ }K(\phi, \phi)^{-1} \mathbf{y} $
which derives directly from the GP posterior mean formula, and since the term $K(\phi, \phi)^{-1} and \mathbf{y} are both independent of the testing points (X_s, or \phi^*}, the posterior gradient is simply the matrix produce of the kernel gradient with the inverse Gram and the training label vector.
Parameters#
X_s: The locations on which the GP posterior mean derivatives should be evaluated. If left blank, the derivatives will be evaluated at the training points.
compute_grad_var: bool. If true, also compute the gradient variance.
The derivative of GP is also a GP, and thus the predictive distribution of the posterior gradient is Gaussian. The posterior mean is given above, and the posterior variance is: $ \mathbb{V}[\frac{\partial f^}{\partial \phi^}]= \frac{\partial^2k(\phi^, \phi^)}{\partial \phi^^2} - \frac{\partial k(\phi^, \Phi)}{\partial \phi^}K(X, X)^{-1}\frac{\partial k{(\Phi, \phi^)}}{\partial \phi^*} $
Returns#
list of K torch.Tensor of the shape N x2 D, where N is the length of the X_s list (each element of which is a networkx graph), K is the number of kernel_operators in the combined kernel and D is the dimensionality of the feature vector (this is determined by the specific graph kernel.
OR
list of K torch.Tensor of shape D, if averaged_over_samples flag is enabled.
Source code in neps/optimizers/bayesian_optimization/models/gp.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 |
|
predict
#
predict(x_configs, preserve_comp_graph: bool = False)
Kriging predictions
Source code in neps/optimizers/bayesian_optimization/models/gp.py
compute_log_marginal_likelihood
#
compute_log_marginal_likelihood(
K_i: Tensor,
logDetK: Tensor,
y: Tensor,
normalize: bool = True,
log_prior_dist=None,
)
Compute the zero mean Gaussian process log marginal likelihood given the inverse of Gram matrix K(x2,x2), its log determinant, and the training label vector y. Option:
normalize: normalize the log marginal likelihood by the length of the label vector, as per the gpytorch routine.
prior: A pytorch distribution object. If specified, the hyperparameter prior will be taken into consideration and we use Type-II MAP instead of Type-II MLE (compute log_posterior instead of log_evidence)
Source code in neps/optimizers/bayesian_optimization/models/gp.py
compute_pd_inverse
#
Compute the inverse of a postive-(semi)definite matrix K using Cholesky inversion.
Source code in neps/optimizers/bayesian_optimization/models/gp.py
get_grad
#
Average across the samples via a Monte Carlo sampling scheme. Also estimates the empirical variance. :param average_occurrences: if True, do a weighted summation based on the frequency distribution of the occurrence to compute a gradient per each feature. Otherwise, each different occurrence (\phi_i = k) will get a different gradient estimate.
Source code in neps/optimizers/bayesian_optimization/models/gp.py
standardize_x
#
Standardize the vectorial input into a d-dimensional hypercube [0, 1]^d, where d is the number of features. if x_min ond x_max are supplied, x2 will be standardised using these instead. This is used when standardising the validation/test inputs.
Source code in neps/optimizers/bayesian_optimization/models/gp.py
unnormalize_y
#
Similar to the undoing of the pre-processing step above, but on the output predictions