smac.epm.rf_with_instances module

class smac.epm.rf_with_instances.RandomForestWithInstances(types: numpy.ndarray, bounds: numpy.ndarray, num_trees: int = 10, do_bootstrapping: bool = True, n_points_per_tree: int = -1, ratio_features: float = 0.8333333333333334, min_samples_split: int = 3, min_samples_leaf: int = 3, max_depth: int = 20, eps_purity: int = 1e-08, max_num_nodes: int = 1048576, seed: int = 42, **kwargs)[source]

Bases: smac.epm.base_epm.AbstractEPM

Interface to the random forest that takes instance features into account.

rf_opts

Random forest hyperparameter

n_points_per_tree

int

rf

regression.binary_rss_forest – Only available after training

hypers

list – List of random forest hyperparameters

seed

int

types

list

bounds

list

rng

np.random.RandomState

logger

logging.logger

Constructor

Parameters:
  • types (np.ndarray (D)) – Specifies the number of categorical values of an input dimension where the i-th entry corresponds to the i-th input dimension. Let’s say we have 2 dimension where the first dimension consists of 3 different categorical choices and the second dimension is continuous than we have to pass np.array([2, 0]). Note that we count starting from 0.
  • bounds (np.ndarray (D, 2)) – Specifies the bounds for continuous features.
  • num_trees (int) – The number of trees in the random forest.
  • do_bootstrapping (bool) – Turns on / off bootstrapping in the random forest.
  • n_points_per_tree (int) – Number of points per tree. If <= 0 X.shape[0] will be used in _train(X, y) instead
  • ratio_features (float) – The ratio of features that are considered for splitting.
  • min_samples_split (int) – The minimum number of data points to perform a split.
  • min_samples_leaf (int) – The minimum number of data points in a leaf.
  • max_depth (int) – The maximum depth of a single tree.
  • eps_purity (float) – The minimum difference between two target values to be considered different
  • max_num_nodes (int) – The maxmimum total number of nodes in a tree
  • seed (int) – The seed that is passed to the random_forest_run library.
predict(X: numpy.ndarray)

Predict means and variances for given X.

Parameters:X (np.ndarray of shape = [n_samples, n_features (config + instance features)]) – Training samples
Returns:
  • means (np.ndarray of shape = [n_samples, n_objectives]) – Predictive mean
  • vars (np.ndarray of shape = [n_samples, n_objectives]) – Predictive variance
predict_marginalized_over_instances(X: numpy.ndarray)

Predict mean and variance marginalized over all instances.

Returns the predictive mean and variance marginalised over all instances for a set of configurations.

Parameters:X (np.ndarray) – [n_samples, n_features (config)]
Returns:
  • means (np.ndarray of shape = [n_samples, 1]) – Predictive mean
  • vars (np.ndarray of shape = [n_samples, 1]) – Predictive variance
train(X: numpy.ndarray, Y: numpy.ndarray, **kwargs)

Trains the EPM on X and Y.

Parameters:
  • X (np.ndarray [n_samples, n_features (config + instance features)]) – Input data points.
  • Y (np.ndarray [n_samples, n_objectives]) – The corresponding target values. n_objectives must match the number of target names specified in the constructor.
Returns:

self

Return type:

AbstractEPM