Note

Click here to download the full example code or to run this example in your browser via Binder

Early stopping and Callbacks¶

The example below shows how we can use the get_trials_callback parameter of auto-sklearn to implement an early-stopping mechanism through a callback.

These callbacks give access to the result of each model + hyperparameter configuration optimized by SMAC, the underlying optimizer for autosklearn. By checking the cost of a result, we can implement a simple yet effective early stopping mechanism!

Do note however, this does not provide any access to the ensembles that autosklearn produces, only the individual models. You may wish to perform a more sophisticated early stopping mechanism such that there are enough good models for autosklearn to build and ensemble with. This is here to provide a simple example.

from __future__ import annotations

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue

Build and fit a classifier¶

def callback(
    smbo: SMBO,
    run_info: RunInfo,
    result: RunValue,
    time_left: float,
) -> bool | None:
    """Stop early if we get a very low cost value for a single run

    The return value indicates to SMAC whether to stop or not. False will
    stop the search process while any other value will mean it continues.
    """
    # You can find out the parameters in the SMAC documentation
    # https://automl.github.io/SMAC3/main/
    if result.cost <= 0.02:
        print("Stopping!")
        print(run_info)
        print(result)
        return False


X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")

Stopping!
RunInfo(config=Configuration(values={
  'balancing:strategy': 'none',
  'classifier:__choice__': 'extra_trees',
  'classifier:extra_trees:bootstrap': 'False',
  'classifier:extra_trees:criterion': 'gini',
  'classifier:extra_trees:max_depth': 'None',
  'classifier:extra_trees:max_features': 0.5707983257382487,
  'classifier:extra_trees:max_leaf_nodes': 'None',
  'classifier:extra_trees:min_impurity_decrease': 0.0,
  'classifier:extra_trees:min_samples_leaf': 3,
  'classifier:extra_trees:min_samples_split': 11,
  'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
  'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
  'feature_preprocessor:__choice__': 'polynomial',
  'feature_preprocessor:polynomial:degree': 2,
  'feature_preprocessor:polynomial:include_bias': 'False',
  'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.6877820491790771, status=<StatusType.SUCCESS: 1>, starttime=1663663263.9033709, endtime=1663663265.6127412, additional_info={'duration': 1.5963304042816162, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})

AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      get_trials_callback=<function callback at 0x7f05d16c1f70>,
                      per_run_time_limit=30, time_left_for_this_task=120)

View the models found by auto-sklearn¶

print(automl.leaderboard())

          rank  ensemble_weight           type      cost  duration
model_id
7            1             0.68    extra_trees  0.014184  1.687782
2            2             0.10  random_forest  0.028369  2.002935
3            3             0.22            mlp  0.028369  1.103178

Print the final ensemble constructed by auto-sklearn¶

pprint(automl.show_models(), indent=4)

{   2: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d61c9370>,
           'cost': 0.028368794326241176,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d45ee910>,
           'ensemble_weight': 0.1,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61c9400>,
           'model_id': 2,
           'rank': 1,
           'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)},
    3: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d24ae4c0>,
           'cost': 0.028368794326241176,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05e9de29a0>,
           'ensemble_weight': 0.22,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d3db8670>,
           'model_id': 3,
           'rank': 2,
           'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
              beta_2=0.9, early_stopping=True,
              hidden_layer_sizes=(115, 115, 115),
              learning_rate_init=0.00018009776276177523, max_iter=32,
              n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
    7: {   'balancing': Balancing(random_state=1),
           'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d457f820>,
           'cost': 0.014184397163120588,
           'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d0d67f40>,
           'ensemble_weight': 0.68,
           'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61aa1f0>,
           'model_id': 7,
           'rank': 3,
           'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
                     n_estimators=512, n_jobs=1, random_state=1,
                     warm_start=True)}}

Get the Score of the final ensemble¶

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9440559440559441

Total running time of the script: ( 0 minutes 22.430 seconds)

Gallery generated by Sphinx-Gallery