Classification

The following example shows how to fit a simple classification model with auto-sklearn.

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

Data Loading

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

Build and fit a classifier

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_classification_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')

Out:

/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/meta_base.py:68: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.metafeatures = self.metafeatures.append(metafeatures)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/meta_base.py:72: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.algorithm_runs[metric].append(runs)

AutoSklearnClassifier(per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_classification_example_tmp')

View the models found by auto-sklearn

print(automl.leaderboard())

Out:

          rank  ensemble_weight                 type      cost  duration
model_id
7            1             0.08          extra_trees  0.014184  1.386039
16           2             0.06    gradient_boosting  0.021277  0.908949
21           3             0.02          extra_trees  0.021277  1.240055
2            4             0.04        random_forest  0.028369  1.507876
3            5             0.06                  mlp  0.028369  0.819861
22           6             0.02    gradient_boosting  0.028369  0.976891
10           7             0.06        random_forest  0.028369  1.648257
11           8             0.04        random_forest  0.028369  1.878659
13           9             0.04    gradient_boosting  0.028369  1.238531
26          10             0.06          extra_trees  0.028369  2.249340
19          11             0.08          extra_trees  0.028369  2.809216
27          12             0.06          extra_trees  0.028369  7.942544
8           13             0.02        random_forest  0.035461  1.692722
17          14             0.02    gradient_boosting  0.035461  1.413625
25          15             0.02             adaboost  0.042553  1.828053
9           16             0.02          extra_trees  0.042553  1.613578
30          17             0.04             adaboost  0.049645  0.595418
34          18             0.04          extra_trees  0.049645  1.247910
23          19             0.02                  mlp  0.049645  1.978261
15          20             0.04                  mlp  0.049645  3.234449
33          21             0.06        decision_tree  0.056738  0.868145
31          22             0.02          gaussian_nb  0.056738  0.643746
24          23             0.02        random_forest  0.070922  1.500164
20          24             0.04   passive_aggressive  0.078014  0.634079
32          25             0.02  k_nearest_neighbors  0.092199  0.678369

Get the Score of the final ensemble

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Out:

Accuracy score: 0.958041958041958

Total running time of the script: ( 1 minutes 57.665 seconds)

Gallery generated by Sphinx-Gallery