smac.epm.random_forest.rf_with_instances¶
Classes
|
Random forest that takes instance features into account. |
- class smac.epm.random_forest.rf_with_instances.RandomForestWithInstances(configspace, types, bounds, seed, log_y=False, num_trees=10, do_bootstrapping=True, n_points_per_tree=- 1, ratio_features=0.8333333333333334, min_samples_split=3, min_samples_leaf=3, max_depth=1048576, eps_purity=1e-08, max_num_nodes=1048576, instance_features=None, pca_components=None)[source]¶
Bases:
smac.epm.random_forest.BaseModel
Random forest that takes instance features into account.
- Parameters
types (List[int]) – Specifies the number of categorical values of an input dimension where the i-th entry corresponds to the i-th input dimension. Let’s say we have 2 dimension where the first dimension consists of 3 different categorical choices and the second dimension is continuous than we have to pass [3, 0]. Note that we count starting from 0.
bounds (List[Tuple[float, float]]) – bounds of input dimensions: (lower, uppper) for continuous dims; (n_cat, np.nan) for categorical dims
seed (int) – The seed that is passed to the random_forest_run library.
log_y (bool) – y values (passed to this RF) are expected to be log(y) transformed; this will be considered during predicting
num_trees (int) – The number of trees in the random forest.
do_bootstrapping (bool) – Turns on / off bootstrapping in the random forest.
n_points_per_tree (int) – Number of points per tree. If <= 0 X.shape[0] will be used in _train(X, y) instead
ratio_features (float) – The ratio of features that are considered for splitting.
min_samples_split (int) – The minimum number of data points to perform a split.
min_samples_leaf (int) – The minimum number of data points in a leaf.
max_depth (int) – The maximum depth of a single tree.
eps_purity (float) – The minimum difference between two target values to be considered different
max_num_nodes (int) – The maxmimum total number of nodes in a tree
instance_features (np.ndarray (I, K)) – Contains the K dimensional instance features of the I different instances
pca_components (float) – Number of components to keep when using PCA to reduce dimensionality of instance features. Requires to set n_feats (> pca_dims).
- rf_opts¶
Random forest hyperparameter
- Type
regression.rf_opts
- n_points_per_tree¶
- Type
int
- rf¶
Only available after training
- Type
regression.binary_rss_forest
- hypers¶
List of random forest hyperparameters
- Type
list
- unlog_y¶
- Type
bool
- seed¶
- Type
int
- types¶
- Type
np.ndarray
- bounds¶
- Type
list
- rng¶
- Type
np.random.RandomState
- logger¶
- Type
logging.logger
- predict_marginalized_over_instances(X)[source]¶
Predict mean and variance marginalized over all instances.
Returns the predictive mean and variance marginalised over all instances for a set of configurations.
Note
This method overwrites the same method of ~smac.epm.base_epm.AbstractEPM; the following method is random forest specific and follows the SMAC2 implementation; it requires no distribution assumption to marginalize the uncertainty estimates
- Parameters
X (np.ndarray) – [n_samples, n_features (config)]
- Return type
Tuple
[ndarray
,ndarray
]- Returns
means (np.ndarray of shape = [n_samples, 1]) – Predictive mean
vars (np.ndarray of shape = [n_samples, 1]) – Predictive variance