Classification

The following example shows how to fit a simple classification model with auto-sklearn.

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

Data Loading

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

Build and fit a classifier

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder="/tmp/autosklearn_classification_example_tmp",
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_classification_example_tmp')

View the models found by auto-sklearn

print(automl.leaderboard())
          rank  ensemble_weight                 type      cost   duration
model_id
7            1             0.16          extra_trees  0.014184   1.569340
27           2             0.04          extra_trees  0.014184   2.449368
16           4             0.04    gradient_boosting  0.021277   1.235045
21           5             0.06          extra_trees  0.021277   1.586606
30           3             0.04          extra_trees  0.021277  12.410941
2            6             0.02        random_forest  0.028369   1.892178
3            7             0.08                  mlp  0.028369   1.077336
6            8             0.02                  mlp  0.028369   1.222855
11           9             0.02        random_forest  0.028369   2.290498
14          11             0.02                  mlp  0.028369   2.054393
22          10             0.06    gradient_boosting  0.028369   1.379215
5           16             0.02        random_forest  0.035461   2.209646
8           15             0.02        random_forest  0.035461   2.130122
12          14             0.02    gradient_boosting  0.035461   1.431612
18          13             0.02        random_forest  0.035461   2.392527
31          12             0.08        random_forest  0.035461   1.798755
9           17             0.04          extra_trees  0.042553   1.930847
28          19             0.08         bernoulli_nb  0.070922   1.001414
33          18             0.02        decision_tree  0.070922   8.978891
34          20             0.02  k_nearest_neighbors  0.070922   0.897243
20          22             0.02   passive_aggressive  0.078014   0.737455
32          21             0.02    gradient_boosting  0.078014   1.069623
29          23             0.08                  mlp  0.134752   2.241808

Get the Score of the final ensemble

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.958041958041958

Total running time of the script: ( 2 minutes 1.815 seconds)

Gallery generated by Sphinx-Gallery