smac.epm.random_forest

Classes

BaseModel(configspace, types, bounds, seed)

class smac.epm.random_forest.BaseModel(configspace, types, bounds, seed, instance_features=None, pca_components=None)[source]

Bases: smac.epm.base_epm.BaseEPM

class smac.epm.random_forest.RandomForestWithInstances(configspace, types, bounds, seed, log_y=False, num_trees=10, do_bootstrapping=True, n_points_per_tree=- 1, ratio_features=0.8333333333333334, min_samples_split=3, min_samples_leaf=3, max_depth=1048576, eps_purity=1e-08, max_num_nodes=1048576, instance_features=None, pca_components=None)[source]

Bases: smac.epm.random_forest.BaseModel

Random forest that takes instance features into account.

Parameters
  • types (List[int]) – Specifies the number of categorical values of an input dimension where the i-th entry corresponds to the i-th input dimension. Let’s say we have 2 dimension where the first dimension consists of 3 different categorical choices and the second dimension is continuous than we have to pass [3, 0]. Note that we count starting from 0.

  • bounds (List[Tuple[float, float]]) – bounds of input dimensions: (lower, uppper) for continuous dims; (n_cat, np.nan) for categorical dims

  • seed (int) – The seed that is passed to the random_forest_run library.

  • log_y (bool) – y values (passed to this RF) are expected to be log(y) transformed; this will be considered during predicting

  • num_trees (int) – The number of trees in the random forest.

  • do_bootstrapping (bool) – Turns on / off bootstrapping in the random forest.

  • n_points_per_tree (int) – Number of points per tree. If <= 0 X.shape[0] will be used in _train(X, y) instead

  • ratio_features (float) – The ratio of features that are considered for splitting.

  • min_samples_split (int) – The minimum number of data points to perform a split.

  • min_samples_leaf (int) – The minimum number of data points in a leaf.

  • max_depth (int) – The maximum depth of a single tree.

  • eps_purity (float) – The minimum difference between two target values to be considered different

  • max_num_nodes (int) – The maxmimum total number of nodes in a tree

  • instance_features (np.ndarray (I, K)) – Contains the K dimensional instance features of the I different instances

  • pca_components (float) – Number of components to keep when using PCA to reduce dimensionality of instance features. Requires to set n_feats (> pca_dims).

rf_opts

Random forest hyperparameter

Type

regression.rf_opts

n_points_per_tree
Type

int

rf

Only available after training

Type

regression.binary_rss_forest

hypers

List of random forest hyperparameters

Type

list

unlog_y
Type

bool

seed
Type

int

types
Type

np.ndarray

bounds
Type

list

rng
Type

np.random.RandomState

logger
Type

logging.logger

predict_marginalized_over_instances(X)[source]

Predict mean and variance marginalized over all instances.

Returns the predictive mean and variance marginalised over all instances for a set of configurations.

Note

This method overwrites the same method of ~smac.epm.base_epm.AbstractEPM; the following method is random forest specific and follows the SMAC2 implementation; it requires no distribution assumption to marginalize the uncertainty estimates

Parameters

X (np.ndarray) – [n_samples, n_features (config)]

Return type

Tuple[ndarray, ndarray]

Returns

  • means (np.ndarray of shape = [n_samples, 1]) – Predictive mean

  • vars (np.ndarray of shape = [n_samples, 1]) – Predictive variance

Modules

smac.epm.random_forest.rf_mo

smac.epm.random_forest.rf_with_instances

smac.epm.random_forest.rfr_imputator