Note
Click here to download the full example code or to run this example in your browser via Binder
Early stopping and Callbacks¶
The example below shows how we can use the get_trials_callback
parameter of
auto-sklearn to implement an early-stopping mechanism through a callback.
These callbacks give access to the result of each model + hyperparameter configuration optimized by SMAC, the underlying optimizer for autosklearn. By checking the cost of a result, we can implement a simple yet effective early stopping mechanism!
Do note however, this does not provide any access to the ensembles that autosklearn produces, only the individual models. You may wish to perform a more sophisticated early stopping mechanism such that there are enough good models for autosklearn to build and ensemble with. This is here to provide a simple example.
from __future__ import annotations
from pprint import pprint
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue
Build and fit a classifier¶
def callback(
smbo: SMBO,
run_info: RunInfo,
result: RunValue,
time_left: float,
) -> bool | None:
"""Stop early if we get a very low cost value for a single run
The return value indicates to SMAC whether to stop or not. False will
stop the search process while any other value will mean it continues.
"""
# You can find out the parameters in the SMAC documentation
# https://automl.github.io/SMAC3/main/
if result.cost <= 0.02:
print("Stopping!")
print(run_info)
print(result)
return False
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
Stopping!
RunInfo(config=Configuration(values={
'balancing:strategy': 'none',
'classifier:__choice__': 'extra_trees',
'classifier:extra_trees:bootstrap': 'False',
'classifier:extra_trees:criterion': 'gini',
'classifier:extra_trees:max_depth': 'None',
'classifier:extra_trees:max_features': 0.5707983257382487,
'classifier:extra_trees:max_leaf_nodes': 'None',
'classifier:extra_trees:min_impurity_decrease': 0.0,
'classifier:extra_trees:min_samples_leaf': 3,
'classifier:extra_trees:min_samples_split': 11,
'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
'data_preprocessor:__choice__': 'feature_type',
'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
'feature_preprocessor:__choice__': 'polynomial',
'feature_preprocessor:polynomial:degree': 2,
'feature_preprocessor:polynomial:include_bias': 'False',
'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.6877820491790771, status=<StatusType.SUCCESS: 1>, starttime=1663663263.9033709, endtime=1663663265.6127412, additional_info={'duration': 1.5963304042816162, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
get_trials_callback=<function callback at 0x7f05d16c1f70>,
per_run_time_limit=30, time_left_for_this_task=120)
View the models found by auto-sklearn¶
print(automl.leaderboard())
rank ensemble_weight type cost duration
model_id
7 1 0.68 extra_trees 0.014184 1.687782
2 2 0.10 random_forest 0.028369 2.002935
3 3 0.22 mlp 0.028369 1.103178
Print the final ensemble constructed by auto-sklearn¶
pprint(automl.show_models(), indent=4)
{ 2: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d61c9370>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d45ee910>,
'ensemble_weight': 0.1,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61c9400>,
'model_id': 2,
'rank': 1,
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
3: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d24ae4c0>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05e9de29a0>,
'ensemble_weight': 0.22,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d3db8670>,
'model_id': 3,
'rank': 2,
'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
beta_2=0.9, early_stopping=True,
hidden_layer_sizes=(115, 115, 115),
learning_rate_init=0.00018009776276177523, max_iter=32,
n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
7: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d457f820>,
'cost': 0.014184397163120588,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d0d67f40>,
'ensemble_weight': 0.68,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61aa1f0>,
'model_id': 7,
'rank': 3,
'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)}}
Get the Score of the final ensemble¶
predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9440559440559441
Total running time of the script: ( 0 minutes 22.430 seconds)