Note

Click here to download the full example code or to run this example in your browser via Binder

Early stopping and Callbacks¶

The example below shows how we can use the get_trials_callback parameter of auto-sklearn to implement an early-stopping mechanism through a callback.

These callbacks give access to the result of each model + hyperparameter configuration optimized by SMAC, the underlying optimizer for autosklearn. By checking the cost of a result, we can implement a simple yet effective early stopping mechanism!

Do note however, this does not provide any access to the ensembles that autosklearn produces, only the individual models. You may wish to perform a more sophisticated early stopping mechanism such that there are enough good models for autosklearn to build and ensemble with. This is here to provide a simple example.

from __future__ import annotations

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue

Build and fit a classifier¶

def callback(
    smbo: SMBO,
    run_info: RunInfo,
    result: RunValue,
    time_left: float,
) -> bool | None:
    """Stop early if we get a very low cost value for a single run

    The return value indicates to SMAC whether to stop or not. False will
    stop the search process while any other value will mean it continues.
    """
    # You can find out the parameters in the SMAC documentation
    # https://automl.github.io/SMAC3/main/
    if result.cost <= 0.02:
        print("Stopping!")
        print(run_info)
        print(result)
        return False


X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")

Fitting to the training data:   0%|          | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]
Fitting to the training data:   1%|          | 1/120 [00:01<02:00,  1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|1         | 2/120 [00:02<01:58,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|2         | 3/120 [00:03<01:57,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   3%|3         | 4/120 [00:04<01:56,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   4%|4         | 5/120 [00:05<01:55,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   5%|5         | 6/120 [00:06<01:54,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   6%|5         | 7/120 [00:07<01:53,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   7%|6         | 8/120 [00:08<01:52,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|7         | 9/120 [00:09<01:51,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|8         | 10/120 [00:10<01:50,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   9%|9         | 11/120 [00:11<01:49,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  10%|#         | 12/120 [00:12<01:48,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  11%|#         | 13/120 [00:13<01:47,  1.00s/it, The total time budget for this task is 0:02:00]Stopping!
RunInfo(config=Configuration(values={
  'balancing:strategy': 'none',
  'classifier:__choice__': 'extra_trees',
  'classifier:extra_trees:bootstrap': 'False',
  'classifier:extra_trees:criterion': 'gini',
  'classifier:extra_trees:max_depth': 'None',
  'classifier:extra_trees:max_features': 0.5707983257382487,
  'classifier:extra_trees:max_leaf_nodes': 'None',
  'classifier:extra_trees:min_impurity_decrease': 0.0,
  'classifier:extra_trees:min_samples_leaf': 3,
  'classifier:extra_trees:min_samples_split': 11,
  'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
  'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
  'feature_preprocessor:__choice__': 'polynomial',
  'feature_preprocessor:polynomial:degree': 2,
  'feature_preprocessor:polynomial:include_bias': 'False',
  'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.7881081104278564, status=<StatusType.SUCCESS: 1>, starttime=1669292713.2752733, endtime=1669292715.0874944, additional_info={'duration': 1.6978273391723633, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})

Fitting to the training data:  12%|#1        | 14/120 [00:14<01:46,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [00:14<00:00,  8.55it/s, The total time budget for this task is 0:02:00]

AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      get_trials_callback=<function callback at 0x7f2af4709310>,
                      per_run_time_limit=30, time_left_for_this_task=120)

View the models found by auto-sklearn¶

print(automl.leaderboard())

          rank  ensemble_weight           type      cost  duration
model_id
7            1             0.68    extra_trees  0.014184  1.788108
2            2             0.10  random_forest  0.028369  2.042358
3            3             0.22            mlp  0.028369  1.138246

Print the final ensemble constructed by auto-sklearn¶

pprint(automl.show_models(), indent=4)

{   2: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2b10e36610>,
           'cost': 0.028368794326241176,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af4f2dfd0>,
           'ensemble_weight': 0.1,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2b10e36b20>,
           'model_id': 2,
           'rank': 2,
           'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)},
    3: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2af4f23760>,
           'cost': 0.028368794326241176,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2afd2d2130>,
           'ensemble_weight': 0.22,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af3ec6c10>,
           'model_id': 3,
           'rank': 3,
           'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
              beta_2=0.9, early_stopping=True,
              hidden_layer_sizes=(115, 115, 115),
              learning_rate_init=0.00018009776276177523, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    7: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2af76e6fa0>,
           'cost': 0.014184397163120588,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af417c070>,
           'ensemble_weight': 0.68,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2afd576bb0>,
           'model_id': 7,
           'rank': 1,
           'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
                     n_estimators=512, n_jobs=1, random_state=1,
                     warm_start=True)}}

Get the Score of the final ensemble¶

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9440559440559441

Total running time of the script: ( 0 minutes 20.441 seconds)

Gallery generated by Sphinx-Gallery