Note
Click here to download the full example code or to run this example in your browser via Binder
Sequential Usage¶
By default, auto-sklearn fits the machine learning models and build their ensembles in parallel. However, it is also possible to run the two processes sequentially. The example below shows how to first fit the models and build the ensembles afterwards.
from pprint import pprint
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
Data Loading¶
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
Build and fit the classifier¶
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
tmp_folder='/tmp/autosklearn_sequential_example_tmp',
# Do not construct ensembles in parallel to avoid using more than one
# core at a time. The ensemble will be constructed after auto-sklearn
# finished fitting all machine learning models.
ensemble_size=0,
delete_tmp_folder_after_terminate=False,
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')
# This call to fit_ensemble uses all models trained in the previous call
# to fit to build an ensemble which can be used with automl.predict()
automl.fit_ensemble(y_train, ensemble_size=50)
Out:
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/meta_base.py:68: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self.metafeatures = self.metafeatures.append(metafeatures)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/meta_base.py:72: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
self.algorithm_runs[metric].append(runs)
AutoSklearnClassifier(delete_tmp_folder_after_terminate=False, ensemble_size=0,
per_run_time_limit=30, time_left_for_this_task=120,
tmp_folder='/tmp/autosklearn_sequential_example_tmp')
Print the final ensemble constructed by auto-sklearn¶
pprint(automl.show_models(), indent=4)
Out:
{ 2: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e73597c0>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e5956e80>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e7359610>,
'model_id': 2,
'rank': 4,
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
3: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebb8ee80>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e3596700>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ebb8e5b0>,
'model_id': 3,
'rank': 5,
'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
beta_2=0.9, early_stopping=True,
hidden_layer_sizes=(115, 115, 115),
learning_rate_init=0.00018009776276177523, max_iter=32,
n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
7: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ec151c70>,
'cost': 0.014184397163120588,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ef3fba60>,
'ensemble_weight': 0.08,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ec151160>,
'model_id': 7,
'rank': 1,
'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)},
8: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb1028f8ac0>,
'cost': 0.03546099290780147,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e516d160>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0eda89670>,
'model_id': 8,
'rank': 13,
'sklearn_classifier': RandomForestClassifier(max_features=2, min_samples_leaf=2, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
9: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e7089460>,
'cost': 0.04255319148936165,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ec021b80>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ef7a5b50>,
'model_id': 9,
'rank': 15,
'sklearn_classifier': ExtraTreesClassifier(max_features=9, min_samples_split=10, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
10: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e33190d0>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ea5aa8b0>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e3319dc0>,
'model_id': 10,
'rank': 6,
'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=4, min_samples_split=6,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)},
11: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e988c340>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0edb94a30>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e988c5e0>,
'model_id': 11,
'rank': 7,
'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=23, min_samples_leaf=7,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)},
13: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0eda78550>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ef339640>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0eac71c10>,
'model_id': 13,
'rank': 8,
'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=False,
l2_regularization=1.0647401999412075e-10,
learning_rate=0.08291320147381159, max_iter=512,
max_leaf_nodes=39, n_iter_no_change=0,
random_state=1, validation_fraction=None,
warm_start=True)},
15: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebf9fd90>,
'cost': 0.049645390070921946,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ea1d38b0>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ebf9f970>,
'model_id': 15,
'rank': 17,
'sklearn_classifier': MLPClassifier(alpha=4.2841884333778574e-06, beta_1=0.999, beta_2=0.9,
hidden_layer_sizes=(263, 263, 263),
learning_rate_init=0.0011804284312897009, max_iter=128,
n_iter_no_change=32, random_state=1, validation_fraction=0.0,
verbose=0, warm_start=True)},
16: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e98aa670>,
'cost': 0.021276595744680882,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ec151940>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e98aa490>,
'model_id': 16,
'rank': 2,
'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=True,
l2_regularization=3.387912939529945e-10,
learning_rate=0.30755227194768237, max_iter=128,
max_leaf_nodes=60, min_samples_leaf=39,
n_iter_no_change=18, random_state=1,
validation_fraction=None, warm_start=True)},
17: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0eb41a3d0>,
'cost': 0.03546099290780147,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0eaa0b760>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0edc49790>,
'model_id': 17,
'rank': 14,
'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=True,
l2_regularization=0.4635442279519353,
learning_rate=0.09809681787962342, max_iter=512,
max_leaf_nodes=328, min_samples_leaf=2,
n_iter_no_change=2, random_state=1,
validation_fraction=None, warm_start=True)},
19: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0edb9d250>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0eb0edbe0>,
'ensemble_weight': 0.08,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0edb9d460>,
'model_id': 19,
'rank': 9,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=448, min_samples_leaf=2,
min_samples_split=20, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
20: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ea121100>,
'cost': 0.07801418439716312,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ebff5c40>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e5988430>,
'model_id': 20,
'rank': 24,
'sklearn_classifier': PassiveAggressiveClassifier(C=0.14268277711454813, max_iter=32, random_state=1,
tol=0.0002600768160857831, warm_start=True)},
21: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebff5fa0>,
'cost': 0.021276595744680882,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e58da760>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ebff5fd0>,
'model_id': 21,
'rank': 3,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=4, min_samples_leaf=2,
min_samples_split=15, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
22: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e5956fd0>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e58bc760>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0eb892d30>,
'model_id': 22,
'rank': 10,
'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=True,
l2_regularization=8.057778875694463e-05,
learning_rate=0.09179220974965213, max_iter=256,
max_leaf_nodes=200, n_iter_no_change=18,
random_state=1,
validation_fraction=0.14295295806077554,
warm_start=True)},
23: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ed9a5610>,
'cost': 0.049645390070921946,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0eb708fd0>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ed9a55b0>,
'model_id': 23,
'rank': 18,
'sklearn_classifier': MLPClassifier(alpha=0.02847755502162456, beta_1=0.999, beta_2=0.9,
hidden_layer_sizes=(123, 123),
learning_rate_init=0.000421568792103947, max_iter=256,
n_iter_no_change=32, random_state=1, validation_fraction=0.0,
verbose=0, warm_start=True)},
24: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e73b2940>,
'cost': 0.07092198581560283,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e5938250>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e3b55c70>,
'model_id': 24,
'rank': 23,
'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=16, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
25: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebf9f4f0>,
'cost': 0.04255319148936165,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0eb0bd700>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e70892e0>,
'model_id': 25,
'rank': 16,
'sklearn_classifier': AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=3),
learning_rate=0.046269426995092074, n_estimators=406,
random_state=1)},
26: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebce1220>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ed970b80>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ebce1400>,
'model_id': 26,
'rank': 11,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=414, min_samples_leaf=2,
min_samples_split=19, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
27: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e364cd30>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ebff54c0>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e6436b80>,
'model_id': 27,
'rank': 12,
'sklearn_classifier': ExtraTreesClassifier(bootstrap=True, criterion='entropy', max_features=4126,
min_samples_leaf=7, min_samples_split=13, n_estimators=512,
n_jobs=1, random_state=1, warm_start=True)},
30: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ef38ac40>,
'cost': 0.049645390070921946,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e9ba21c0>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ef38a1f0>,
'model_id': 30,
'rank': 19,
'sklearn_classifier': AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=10),
learning_rate=0.021459464491641638, n_estimators=374,
random_state=1)},
31: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e516dac0>,
'cost': 0.05673758865248224,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0eaabed00>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e974af40>,
'model_id': 31,
'rank': 21,
'sklearn_classifier': GaussianNB()},
32: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0e3e3ceb0>,
'cost': 0.09219858156028371,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0ea9453d0>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0e3e3cac0>,
'model_id': 32,
'rank': 25,
'sklearn_classifier': KNeighborsClassifier(n_neighbors=6)},
33: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0ebe11d30>,
'cost': 0.05673758865248224,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e9bd1f40>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0ec174130>,
'model_id': 33,
'rank': 22,
'sklearn_classifier': DecisionTreeClassifier(class_weight='balanced', criterion='entropy',
max_depth=465, min_samples_leaf=16, min_samples_split=10,
random_state=1)},
34: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fb0eb6deb80>,
'cost': 0.049645390070921946,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fb0e9fa65b0>,
'ensemble_weight': 0.04,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fb0eb6deac0>,
'model_id': 34,
'rank': 20,
'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=3, min_samples_leaf=7,
min_samples_split=14, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)}}
Get the Score of the final ensemble¶
predictions = automl.predict(X_test)
print(automl.sprint_statistics())
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
Out:
auto-sklearn results:
Dataset name: breast_cancer
Metric: accuracy
Best validation score: 0.985816
Number of target algorithm runs: 34
Number of successful target algorithm runs: 33
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 1
Number of target algorithms that exceeded the memory limit: 0
Accuracy score 0.958041958041958
Total running time of the script: ( 2 minutes 3.305 seconds)