Early stopping and Callbacks

The example below shows how we can use the get_trials_callback parameter of auto-sklearn to implement an early-stopping mechanism through a callback.

These callbacks give access to the result of each model + hyperparameter configuration optimized by SMAC, the underlying optimizer for autosklearn. By checking the cost of a result, we can implement a simple yet effective early stopping mechanism!

Do note however, this does not provide any access to the ensembles that autosklearn produces, only the individual models. You may wish to perform a more sophisticated early stopping mechanism such that there are enough good models for autosklearn to build and ensemble with. This is here to provide a simple example.

from __future__ import annotations

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue

Build and fit a classifier

def callback(
    smbo: SMBO,
    run_info: RunInfo,
    result: RunValue,
    time_left: float,
) -> bool | None:
    """Stop early if we get a very low cost value for a single run

    The return value indicates to SMAC whether to stop or not. False will
    stop the search process while any other value will mean it continues.
    """
    # You can find out the parameters in the SMAC documentation
    # https://automl.github.io/SMAC3/main/
    if result.cost <= 0.02:
        print("Stopping!")
        print(run_info)
        print(result)
        return False


X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
Fitting to the training data:   0%|          | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]
Fitting to the training data:   1%|          | 1/120 [00:01<02:00,  1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|1         | 2/120 [00:02<01:58,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|2         | 3/120 [00:03<01:57,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   3%|3         | 4/120 [00:04<01:56,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   4%|4         | 5/120 [00:05<01:55,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   5%|5         | 6/120 [00:06<01:54,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   6%|5         | 7/120 [00:07<01:53,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   7%|6         | 8/120 [00:08<01:52,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|7         | 9/120 [00:09<01:51,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|8         | 10/120 [00:10<01:50,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   9%|9         | 11/120 [00:11<01:49,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  10%|#         | 12/120 [00:12<01:48,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  11%|#         | 13/120 [00:13<01:47,  1.00s/it, The total time budget for this task is 0:02:00]Stopping!
RunInfo(config=Configuration(values={
  'balancing:strategy': 'none',
  'classifier:__choice__': 'extra_trees',
  'classifier:extra_trees:bootstrap': 'False',
  'classifier:extra_trees:criterion': 'gini',
  'classifier:extra_trees:max_depth': 'None',
  'classifier:extra_trees:max_features': 0.5707983257382487,
  'classifier:extra_trees:max_leaf_nodes': 'None',
  'classifier:extra_trees:min_impurity_decrease': 0.0,
  'classifier:extra_trees:min_samples_leaf': 3,
  'classifier:extra_trees:min_samples_split': 11,
  'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
  'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
  'feature_preprocessor:__choice__': 'polynomial',
  'feature_preprocessor:polynomial:degree': 2,
  'feature_preprocessor:polynomial:include_bias': 'False',
  'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.7881081104278564, status=<StatusType.SUCCESS: 1>, starttime=1669292713.2752733, endtime=1669292715.0874944, additional_info={'duration': 1.6978273391723633, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})

Fitting to the training data:  12%|#1        | 14/120 [00:14<01:46,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [00:14<00:00,  8.55it/s, The total time budget for this task is 0:02:00]

AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      get_trials_callback=<function callback at 0x7f2af4709310>,
                      per_run_time_limit=30, time_left_for_this_task=120)

View the models found by auto-sklearn

print(automl.leaderboard())
          rank  ensemble_weight           type      cost  duration
model_id
7            1             0.68    extra_trees  0.014184  1.788108
2            2             0.10  random_forest  0.028369  2.042358
3            3             0.22            mlp  0.028369  1.138246

Get the Score of the final ensemble

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9440559440559441

Total running time of the script: ( 0 minutes 20.441 seconds)

Gallery generated by Sphinx-Gallery