smac.epm.rf_with_instances module

class smac.epm.rf_with_instances.RandomForestWithInstances(configspace: ConfigSpace.configuration_space.ConfigurationSpace, types: List[int], bounds: List[Tuple[float, float]], seed: int, log_y: bool = False, num_trees: int = 10, do_bootstrapping: bool = True, n_points_per_tree: int = - 1, ratio_features: float = 0.8333333333333334, min_samples_split: int = 3, min_samples_leaf: int = 3, max_depth: int = 1048576, eps_purity: float = 1e-08, max_num_nodes: int = 1048576, instance_features: Optional[numpy.ndarray] = None, pca_components: Optional[int] = None)[source]

Bases: smac.epm.base_rf.BaseModel

Random forest that takes instance features into account.

rf_opts

Random forest hyperparameter

Type

regression.rf_opts

n_points_per_tree
Type

int

rf

Only available after training

Type

regression.binary_rss_forest

hypers

List of random forest hyperparameters

Type

list

unlog_y
Type

bool

seed
Type

int

types
Type

np.ndarray

bounds
Type

list

rng
Type

np.random.RandomState

logger
Type

logging.logger

Parameters
  • types (List[int]) – Specifies the number of categorical values of an input dimension where the i-th entry corresponds to the i-th input dimension. Let’s say we have 2 dimension where the first dimension consists of 3 different categorical choices and the second dimension is continuous than we have to pass [3, 0]. Note that we count starting from 0.

  • bounds (List[Tuple[float, float]]) – bounds of input dimensions: (lower, uppper) for continuous dims; (n_cat, np.nan) for categorical dims

  • seed (int) – The seed that is passed to the random_forest_run library.

  • log_y (bool) – y values (passed to this RF) are expected to be log(y) transformed; this will be considered during predicting

  • num_trees (int) – The number of trees in the random forest.

  • do_bootstrapping (bool) – Turns on / off bootstrapping in the random forest.

  • n_points_per_tree (int) – Number of points per tree. If <= 0 X.shape[0] will be used in _train(X, y) instead

  • ratio_features (float) – The ratio of features that are considered for splitting.

  • min_samples_split (int) – The minimum number of data points to perform a split.

  • min_samples_leaf (int) – The minimum number of data points in a leaf.

  • max_depth (int) – The maximum depth of a single tree.

  • eps_purity (float) – The minimum difference between two target values to be considered different

  • max_num_nodes (int) – The maxmimum total number of nodes in a tree

  • instance_features (np.ndarray (I, K)) – Contains the K dimensional instance features of the I different instances

  • pca_components (float) – Number of components to keep when using PCA to reduce dimensionality of instance features. Requires to set n_feats (> pca_dims).

_init_data_container(X: numpy.ndarray, y: numpy.ndarray) → pyrfr.regression.default_data_container[source]

Fills a pyrfr default data container, s.t. the forest knows categoricals and bounds for continous data

Parameters
  • X (np.ndarray [n_samples, n_features]) – Input data points

  • y (np.ndarray [n_samples, ]) – Corresponding target values

Returns

data – The filled data container that pyrfr can interpret

Return type

regression.default_data_container

_predict(X: numpy.ndarray, cov_return_type: Optional[str] = 'diagonal_cov') → Tuple[numpy.ndarray, numpy.ndarray][source]

Predict means and variances for given X.

Parameters
  • X (np.ndarray of shape = [n_samples,) – n_features (config + instance features)]

  • cov_return_type (typing.Optional[str]) – Specifies what to return along with the mean. Refer predict() for more information.

Returns

  • means (np.ndarray of shape = [n_samples, 1]) – Predictive mean

  • vars (np.ndarray of shape = [n_samples, 1]) – Predictive variance

_train(X: numpy.ndarray, y: numpy.ndarray)smac.epm.rf_with_instances.RandomForestWithInstances[source]

Trains the random forest on X and y.

Parameters
  • X (np.ndarray [n_samples, n_features (config + instance features)]) – Input data points.

  • y (np.ndarray [n_samples, ]) – The corresponding target values.

Returns

Return type

self

predict_marginalized_over_instances(X: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]

Predict mean and variance marginalized over all instances.

Returns the predictive mean and variance marginalised over all instances for a set of configurations.

Note

This method overwrites the same method of ~smac.epm.base_epm.AbstractEPM; the following method is random forest specific and follows the SMAC2 implementation; it requires no distribution assumption to marginalize the uncertainty estimates

Parameters

X (np.ndarray) – [n_samples, n_features (config)]

Returns

  • means (np.ndarray of shape = [n_samples, 1]) – Predictive mean

  • vars (np.ndarray of shape = [n_samples, 1]) – Predictive variance