Gp hierarchy
neps.optimizers.bayesian_optimization.models.gp_hierarchy
#
ComprehensiveGPHierarchy
#
ComprehensiveGPHierarchy(
graph_kernels: Iterable,
hp_kernels: Iterable,
likelihood: float = 0.001,
weights=None,
learn_all_h=False,
graph_feature_ard=True,
d_graph_features: int = 0,
normalize_combined_kernel=True,
hierarchy_consider: list = None,
vectorial_features: list = None,
combined_kernel: str = "sum",
verbose: bool = False,
surrogate_model_fit_args: dict = None,
gpytorch_kinv: bool = False,
)
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
dmu_dphi
#
Compute the derivative of the GP posterior mean at the specified input location with respect to the vector embedding of the graph (e.g., if using WL-subtree, this function computes the gradient wrt each subtree pattern)
The derivative is given by $ \frac{\partial \mu^}{\partial \phi ^} = \frac{\partial K(\phi, \phi^)}{\partial \phi ^ }K(\phi, \phi)^{-1} \mathbf{y} $
which derives directly from the GP posterior mean formula, and since the term $K(\phi, \phi)^{-1} and \mathbf{y} are both independent of the testing points (X_s, or \phi^*}, the posterior gradient is simply the matrix produce of the kernel gradient with the inverse Gram and the training label vector.
Parameters#
X_s: The locations on which the GP posterior mean derivatives should be evaluated. If left blank, the derivatives will be evaluated at the training points.
compute_grad_var: bool. If true, also compute the gradient variance.
The derivative of GP is also a GP, and thus the predictive distribution of the posterior gradient is Gaussian. The posterior mean is given above, and the posterior variance is: $ \mathbb{V}[\frac{\partial f^}{\partial \phi^}]= \frac{\partial^2k(\phi^, \phi^)}{\partial \phi^^2} - \frac{\partial k(\phi^, \Phi)}{\partial \phi^}K(X, X)^{-1}\frac{\partial k{(\Phi, \phi^)}}{\partial \phi^*} $
Returns#
list of K torch.Tensor of the shape N x2 D, where N is the length of the X_s list (each element of which is a networkx graph), K is the number of kernel_operators in the combined kernel and D is the dimensionality of the feature vector (this is determined by the specific graph kernel.
OR
list of K torch.Tensor of shape D, if averaged_over_samples flag is enabled.
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 |
|
predict
#
predict(x_configs, preserve_comp_graph: bool = False)
Kriging predictions
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
predict_single_hierarchy
#
predict_single_hierarchy(
x_configs,
hierarchy_id=0,
preserve_comp_graph: bool = False,
)
Kriging predictions
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
NumericalWarning
#
Bases: RuntimeWarning
Warning thrown when convergence criteria are not met, or when comptuations require extra stability.
cholesky_jitter
#
Bases: _dtype_value_context
The jitter value used by psd_safe_cholesky
when using cholesky solves.
- Default for float
: 1e-6
- Default for double
: 1e-8
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
cholesky_max_tries
#
verbose_linalg
#
Bases: _feature_flag
Print out information whenever running an expensive linear algebra routine (e.g. Cholesky, CG, Lanczos, CIQ, etc.) (Default: False)
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
compute_log_marginal_likelihood
#
compute_log_marginal_likelihood(
K_i: Tensor,
logDetK: Tensor,
y: Tensor,
normalize: bool = True,
log_prior_dist=None,
)
Compute the zero mean Gaussian process log marginal likelihood given the inverse of Gram matrix K(x2,x2), its log determinant, and the training label vector y. Option:
normalize: normalize the log marginal likelihood by the length of the label vector, as per the gpytorch routine.
prior: A pytorch distribution object. If specified, the hyperparameter prior will be taken into consideration and we use Type-II MAP instead of Type-II MLE (compute log_posterior instead of log_evidence)
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
compute_pd_inverse
#
Compute the inverse of a postive-(semi)definite matrix K using Cholesky inversion.
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
get_grad
#
Average across the samples via a Monte Carlo sampling scheme. Also estimates the empirical variance. :param average_occurrences: if True, do a weighted summation based on the frequency distribution of the occurrence to compute a gradient per each feature. Otherwise, each different occurrence (\phi_i = k) will get a different gradient estimate.
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
psd_safe_cholesky
#
Compute the Cholesky decomposition of A. If A is only p.s.d, add a small jitter to the diagonal. Args: A (Tensor): The tensor to compute the Cholesky decomposition of upper (bool, optional): See torch.cholesky out (Tensor, optional): See torch.cholesky jitter (float, optional): The jitter to add to the diagonal of A in case A is only p.s.d. If omitted, uses settings.cholesky_jitter.value() max_tries (int, optional): Number of attempts (with successively increasing jitter) to make before raising an error.
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
standardize_x
#
Standardize the vectorial input into a d-dimensional hypercube [0, 1]^d, where d is the number of features. if x_min ond x_max are supplied, x2 will be standardised using these instead. This is used when standardising the validation/test inputs.
Source code in neps/optimizers/bayesian_optimization/models/gp_hierarchy.py
unnormalize_y
#
Similar to the undoing of the pre-processing step above, but on the output predictions