Note
Click here to download the full example code or to run this example in your browser via Binder
Early stopping and Callbacks¶
The example below shows how we can use the get_trials_callback
parameter of
auto-sklearn to implement an early-stopping mechanism through a callback.
These callbacks give access to the result of each model + hyperparameter configuration optimized by SMAC, the underlying optimizer for autosklearn. By checking the cost of a result, we can implement a simple yet effective early stopping mechanism!
Do note however, this does not provide any access to the ensembles that autosklearn produces, only the individual models. You may wish to perform a more sophisticated early stopping mechanism such that there are enough good models for autosklearn to build and ensemble with. This is here to provide a simple example.
from __future__ import annotations
from pprint import pprint
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue
Build and fit a classifier¶
def callback(
smbo: SMBO,
run_info: RunInfo,
result: RunValue,
time_left: float,
) -> bool | None:
"""Stop early if we get a very low cost value for a single run
The return value indicates to SMAC whether to stop or not. False will
stop the search process while any other value will mean it continues.
"""
# You can find out the parameters in the SMAC documentation
# https://automl.github.io/SMAC3/main/
if result.cost <= 0.02:
print("Stopping!")
print(run_info)
print(result)
return False
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
Fitting to the training data: 0%| | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]
Fitting to the training data: 1%| | 1/120 [00:01<02:00, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|1 | 2/120 [00:02<01:58, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|2 | 3/120 [00:03<01:57, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 3%|3 | 4/120 [00:04<01:56, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 4%|4 | 5/120 [00:05<01:55, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 5%|5 | 6/120 [00:06<01:54, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 6%|5 | 7/120 [00:07<01:53, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 7%|6 | 8/120 [00:08<01:52, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|7 | 9/120 [00:09<01:51, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|8 | 10/120 [00:10<01:50, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 9%|9 | 11/120 [00:11<01:49, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 10%|# | 12/120 [00:12<01:48, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 11%|# | 13/120 [00:13<01:47, 1.00s/it, The total time budget for this task is 0:02:00]Stopping!
RunInfo(config=Configuration(values={
'balancing:strategy': 'none',
'classifier:__choice__': 'extra_trees',
'classifier:extra_trees:bootstrap': 'False',
'classifier:extra_trees:criterion': 'gini',
'classifier:extra_trees:max_depth': 'None',
'classifier:extra_trees:max_features': 0.5707983257382487,
'classifier:extra_trees:max_leaf_nodes': 'None',
'classifier:extra_trees:min_impurity_decrease': 0.0,
'classifier:extra_trees:min_samples_leaf': 3,
'classifier:extra_trees:min_samples_split': 11,
'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
'data_preprocessor:__choice__': 'feature_type',
'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
'feature_preprocessor:__choice__': 'polynomial',
'feature_preprocessor:polynomial:degree': 2,
'feature_preprocessor:polynomial:include_bias': 'False',
'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.7881081104278564, status=<StatusType.SUCCESS: 1>, starttime=1669292713.2752733, endtime=1669292715.0874944, additional_info={'duration': 1.6978273391723633, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})
Fitting to the training data: 12%|#1 | 14/120 [00:14<01:46, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [00:14<00:00, 8.55it/s, The total time budget for this task is 0:02:00]
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
get_trials_callback=<function callback at 0x7f2af4709310>,
per_run_time_limit=30, time_left_for_this_task=120)
View the models found by auto-sklearn¶
print(automl.leaderboard())
rank ensemble_weight type cost duration
model_id
7 1 0.68 extra_trees 0.014184 1.788108
2 2 0.10 random_forest 0.028369 2.042358
3 3 0.22 mlp 0.028369 1.138246
Print the final ensemble constructed by auto-sklearn¶
pprint(automl.show_models(), indent=4)
{ 2: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2b10e36610>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af4f2dfd0>,
'ensemble_weight': 0.1,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2b10e36b20>,
'model_id': 2,
'rank': 2,
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
3: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2af4f23760>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2afd2d2130>,
'ensemble_weight': 0.22,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af3ec6c10>,
'model_id': 3,
'rank': 3,
'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
beta_2=0.9, early_stopping=True,
hidden_layer_sizes=(115, 115, 115),
learning_rate_init=0.00018009776276177523, max_iter=32,
n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
7: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2af76e6fa0>,
'cost': 0.014184397163120588,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af417c070>,
'ensemble_weight': 0.68,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2afd576bb0>,
'model_id': 7,
'rank': 1,
'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)}}
Get the Score of the final ensemble¶
predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9440559440559441
Total running time of the script: ( 0 minutes 20.441 seconds)