Note

Click here to download the full example code or to run this example in your browser via Binder

Fit a single configuration¶

Auto-sklearn searches for the best combination of machine learning algorithms and their hyper-parameter configuration for a given task, using Scikit-Learn Pipelines. To further improve performance, this pipelines are ensemble together using Ensemble Selection from Caruana (2004).

This example shows how one can fit one of this pipelines, both, with an user defined configuration, and a randomly sampled one form the configuration space.

The pipelines that Auto-Sklearn fits are compatible with Scikit-Learn API. You can get further documentation about Scikit-Learn models here: <https://scikit-learn.org/stable/getting_started.html`>_

import numpy as np
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics

from ConfigSpace.configuration_space import Configuration

import autosklearn.classification

Data Loading¶

X, y = sklearn.datasets.fetch_openml(data_id=3, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.5, random_state=3
)

Define an estimator¶

cls = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=60,
    memory_limit=4096,
    # We will limit the configuration space only to
    # have RandomForest as a valid model. We recommend enabling all
    # possible models to get a better performance.
    include={"classifier": ["random_forest"]},
    delete_tmp_folder_after_terminate=False,
)

Fit an user provided configuration¶

# We will create a configuration that has a user defined
# min_samples_split in the Random Forest. We recommend you to look into
# how the ConfigSpace package works here:
# https://automl.github.io/ConfigSpace/master/
cs = cls.get_configuration_space(X, y, dataset_name="kr-vs-kp")
config = cs.sample_configuration()
config._values["classifier:random_forest:min_samples_split"] = 11

# Make sure that your changed configuration complies with the configuration space
config.is_valid_configuration()

pipeline, run_info, run_value = cls.fit_pipeline(
    X=X_train,
    y=y_train,
    dataset_name="kr-vs-kp",
    config=config,
    X_test=X_test,
    y_test=y_test,
)

# This object complies with Scikit-Learn Pipeline API.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
print(pipeline.named_steps)

# The fit_pipeline command also returns a named tuple with the pipeline constraints
print(run_info)

# The fit_pipeline command also returns a named tuple with train/test performance
print(run_value)

# We can make sure that our pipeline configuration was honored as follows
print("Passed Configuration:", pipeline.config)
print("Random Forest:", pipeline.named_steps["classifier"].choice.estimator)

# We can also search for new configurations using the fit() method
# Any configurations found by Auto-Sklearn -- even the ones created using
# fit_pipeline() are stored to disk and can be used for Ensemble Selection
cs = cls.fit(X, y, dataset_name="kr-vs-kp")

Fitting to the training data:   0%|          | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
  warnings.warn(

Fitting to the training data:   1%|          | 1/120 [00:01<01:59,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [00:01<00:00, 119.78it/s, The total time budget for this task is 0:02:00]
{'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af6f3f850>, 'balancing': Balancing(random_state=1, strategy='weighting'), 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af09a0160>, 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f2af09a0f70>}
RunInfo(config=Configuration(values={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'True',
  'classifier:random_forest:criterion': 'gini',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 0.7005875707440749,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_impurity_decrease': 0.0,
  'classifier:random_forest:min_samples_leaf': 3,
  'classifier:random_forest:min_samples_split': 11,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'no_encoding',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.000701312346214207,
  'feature_preprocessor:__choice__': 'no_preprocessing',
})
, instance=None, instance_specific=None, seed=1, cutoff=60, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.008530805687203769, time=3.0632126331329346, status=<StatusType.SUCCESS: 1>, starttime=1669293442.6378815, endtime=1669293445.727358, additional_info={'duration': 2.971435546875, 'num_run': 2, 'train_loss': 0.005604857543204056, 'configuration_origin': None})
Passed Configuration: Configuration(values={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'True',
  'classifier:random_forest:criterion': 'gini',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 0.7005875707440749,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_impurity_decrease': 0.0,
  'classifier:random_forest:min_samples_leaf': 3,
  'classifier:random_forest:min_samples_split': 11,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'no_encoding',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.000701312346214207,
  'feature_preprocessor:__choice__': 'no_preprocessing',
})

Random Forest: RandomForestClassifier(max_features=12, min_samples_leaf=3,
                       min_samples_split=11, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)

Fitting to the training data:   0%|          | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
  warnings.warn(

Fitting to the training data:   1%|          | 1/120 [00:01<01:59,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|1         | 2/120 [00:02<01:58,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   2%|2         | 3/120 [00:03<01:57,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   3%|3         | 4/120 [00:04<01:56,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   4%|4         | 5/120 [00:05<01:55,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   5%|5         | 6/120 [00:06<01:54,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   6%|5         | 7/120 [00:07<01:53,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   7%|6         | 8/120 [00:08<01:52,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|7         | 9/120 [00:09<01:51,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   8%|8         | 10/120 [00:10<01:50,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:   9%|9         | 11/120 [00:11<01:49,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  10%|#         | 12/120 [00:12<01:48,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  11%|#         | 13/120 [00:13<01:47,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  12%|#1        | 14/120 [00:14<01:46,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  12%|#2        | 15/120 [00:15<01:45,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  13%|#3        | 16/120 [00:16<01:44,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  14%|#4        | 17/120 [00:17<01:43,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  15%|#5        | 18/120 [00:18<01:42,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  16%|#5        | 19/120 [00:19<01:41,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  17%|#6        | 20/120 [00:20<01:40,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  18%|#7        | 21/120 [00:21<01:39,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  18%|#8        | 22/120 [00:22<01:38,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  19%|#9        | 23/120 [00:23<01:37,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  20%|##        | 24/120 [00:24<01:36,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  21%|##        | 25/120 [00:25<01:35,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  22%|##1       | 26/120 [00:26<01:34,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  22%|##2       | 27/120 [00:27<01:33,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  23%|##3       | 28/120 [00:28<01:32,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  24%|##4       | 29/120 [00:29<01:31,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  25%|##5       | 30/120 [00:30<01:30,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  26%|##5       | 31/120 [00:31<01:29,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  27%|##6       | 32/120 [00:32<01:28,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  28%|##7       | 33/120 [00:33<01:27,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  28%|##8       | 34/120 [00:34<01:26,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  29%|##9       | 35/120 [00:35<01:25,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  30%|###       | 36/120 [00:36<01:24,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  31%|###       | 37/120 [00:37<01:23,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  32%|###1      | 38/120 [00:38<01:22,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  32%|###2      | 39/120 [00:39<01:21,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  33%|###3      | 40/120 [00:40<01:20,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  34%|###4      | 41/120 [00:41<01:19,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  35%|###5      | 42/120 [00:42<01:18,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  36%|###5      | 43/120 [00:43<01:17,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  37%|###6      | 44/120 [00:44<01:16,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  38%|###7      | 45/120 [00:45<01:15,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  38%|###8      | 46/120 [00:46<01:14,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  39%|###9      | 47/120 [00:47<01:13,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  40%|####      | 48/120 [00:48<01:12,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  41%|####      | 49/120 [00:49<01:11,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  42%|####1     | 50/120 [00:50<01:10,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  42%|####2     | 51/120 [00:51<01:09,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  43%|####3     | 52/120 [00:52<01:08,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  44%|####4     | 53/120 [00:53<01:07,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  45%|####5     | 54/120 [00:54<01:06,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  46%|####5     | 55/120 [00:55<01:05,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  47%|####6     | 56/120 [00:56<01:04,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  48%|####7     | 57/120 [00:57<01:03,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  48%|####8     | 58/120 [00:58<01:02,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  49%|####9     | 59/120 [00:59<01:01,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  50%|#####     | 60/120 [01:00<01:00,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  51%|#####     | 61/120 [01:01<00:59,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  52%|#####1    | 62/120 [01:02<00:58,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  52%|#####2    | 63/120 [01:03<00:57,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  53%|#####3    | 64/120 [01:04<00:56,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  54%|#####4    | 65/120 [01:05<00:55,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  55%|#####5    | 66/120 [01:06<00:54,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  56%|#####5    | 67/120 [01:07<00:53,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  57%|#####6    | 68/120 [01:08<00:52,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  57%|#####7    | 69/120 [01:09<00:51,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  58%|#####8    | 70/120 [01:10<00:50,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  59%|#####9    | 71/120 [01:11<00:49,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  60%|######    | 72/120 [01:12<00:48,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  61%|######    | 73/120 [01:13<00:47,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  62%|######1   | 74/120 [01:14<00:46,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  62%|######2   | 75/120 [01:15<00:45,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  63%|######3   | 76/120 [01:16<00:44,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  64%|######4   | 77/120 [01:17<00:43,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  65%|######5   | 78/120 [01:18<00:42,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  66%|######5   | 79/120 [01:19<00:41,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  67%|######6   | 80/120 [01:20<00:40,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  68%|######7   | 81/120 [01:21<00:39,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  68%|######8   | 82/120 [01:22<00:38,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  69%|######9   | 83/120 [01:23<00:37,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  70%|#######   | 84/120 [01:24<00:36,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  71%|#######   | 85/120 [01:25<00:35,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  72%|#######1  | 86/120 [01:26<00:34,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  72%|#######2  | 87/120 [01:27<00:33,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  73%|#######3  | 88/120 [01:28<00:32,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  74%|#######4  | 89/120 [01:29<00:31,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  75%|#######5  | 90/120 [01:30<00:30,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  76%|#######5  | 91/120 [01:31<00:29,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  77%|#######6  | 92/120 [01:32<00:28,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  78%|#######7  | 93/120 [01:33<00:27,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  78%|#######8  | 94/120 [01:34<00:26,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  79%|#######9  | 95/120 [01:35<00:25,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  80%|########  | 96/120 [01:36<00:24,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  81%|########  | 97/120 [01:37<00:23,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  82%|########1 | 98/120 [01:38<00:22,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  82%|########2 | 99/120 [01:39<00:21,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  83%|########3 | 100/120 [01:40<00:20,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  84%|########4 | 101/120 [01:41<00:19,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  85%|########5 | 102/120 [01:42<00:18,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  86%|########5 | 103/120 [01:43<00:17,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  87%|########6 | 104/120 [01:44<00:16,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  88%|########7 | 105/120 [01:45<00:15,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  88%|########8 | 106/120 [01:46<00:14,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  89%|########9 | 107/120 [01:47<00:13,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  90%|######### | 108/120 [01:48<00:12,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  91%|######### | 109/120 [01:49<00:11,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data:  92%|#########1| 110/120 [01:50<00:10,  1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [01:50<00:00,  1.09it/s, The total time budget for this task is 0:02:00]

Total running time of the script: ( 2 minutes 22.789 seconds)

Gallery generated by Sphinx-Gallery