Note
Click here to download the full example code or to run this example in your browser via Binder
Performance-over-time plot¶
This example shows, how to use the performance_over_time_ attribute to plot the performance over train time. performance_over_time_ can contain multiple metrics within a pandas dataframe, namely:
ensemble_optimization_score
ensemble_test_score
single_best_optimization_score
single_best_test_score
single_best_train_score
auto-sklearn can automatically encode categorical columns using a label/ordinal encoder. This example highlights how to properly set the dtype in a DataFrame for this to happen, and showcase how to input also testing data to autosklearn.
The X_train/y_train arguments to the fit function will be used to fit the scikit-learn model, whereas the X_test/y_test will be used to evaluate how good this scikit-learn model generalizes to unseen data (i.e. data not in X_train/y_train). Using test data is a good mechanism to measure if the trained model suffers from overfit, and more details can be found on evaluating estimator performance.
In order to provide *_test_score metrics, X_test and y_test must be provided to the AutoML-Model, as shown in this example.
There is also support to manually indicate the feature types (whether a column is categorical or numerical) via the argument feat_types from fit(). This is important when working with list or numpy arrays as there is no per-column dtype (further details in the example Feature Types).
import time
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
from smac.tae import StatusType
import autosklearn.classification
Data Loading¶
# Using Australian dataset https://www.openml.org/d/40981.
# This example will use the command fetch_openml, which will
# download a properly formatted dataframe if you use as_frame=True.
# For demonstration purposes, we will download a numpy array using
# as_frame=False, and manually creating the pandas DataFrame
X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=False)
# bool and category will be automatically encoded.
# Targets for classification are also automatically encoded
# If using fetch_openml, data is already properly encoded, below
# is an example for user reference
X = pd.DataFrame(data=X, columns=["A" + str(i) for i in range(1, 15)])
desired_boolean_columns = ["A1"]
desired_categorical_columns = ["A4", "A5", "A6", "A8", "A9", "A11", "A12"]
desired_numerical_columns = ["A2", "A3", "A7", "A10", "A13", "A14"]
for column in X.columns:
if column in desired_boolean_columns:
X[column] = X[column].astype("bool")
elif column in desired_categorical_columns:
X[column] = X[column].astype("category")
else:
X[column] = pd.to_numeric(X[column])
y = pd.DataFrame(y, dtype="category")
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, test_size=0.5, random_state=3
)
print(X.dtypes)
A1 bool
A2 float64
A3 float64
A4 category
A5 category
A6 category
A7 float64
A8 category
A9 category
A10 float64
A11 category
A12 category
A13 float64
A14 float64
dtype: object
Build and fit a classifier¶
cls = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
)
cls.fit(X_train, y_train, X_test, y_test)
Fitting to the training data: 0%| | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]
Fitting to the training data: 1%| | 1/120 [00:01<01:59, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|1 | 2/120 [00:02<01:58, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|2 | 3/120 [00:03<01:57, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 3%|3 | 4/120 [00:04<01:56, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 4%|4 | 5/120 [00:05<01:55, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 5%|5 | 6/120 [00:06<01:54, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 6%|5 | 7/120 [00:07<01:53, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 7%|6 | 8/120 [00:08<01:52, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|7 | 9/120 [00:09<01:51, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|8 | 10/120 [00:10<01:50, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 9%|9 | 11/120 [00:11<01:49, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 10%|# | 12/120 [00:12<01:48, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 11%|# | 13/120 [00:13<01:47, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 12%|#1 | 14/120 [00:14<01:46, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 12%|#2 | 15/120 [00:15<01:45, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 13%|#3 | 16/120 [00:16<01:44, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 14%|#4 | 17/120 [00:17<01:43, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 15%|#5 | 18/120 [00:18<01:42, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 16%|#5 | 19/120 [00:19<01:41, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 17%|#6 | 20/120 [00:20<01:40, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 18%|#7 | 21/120 [00:21<01:39, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 18%|#8 | 22/120 [00:22<01:38, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 19%|#9 | 23/120 [00:23<01:37, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 20%|## | 24/120 [00:24<01:36, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 21%|## | 25/120 [00:25<01:35, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 22%|##1 | 26/120 [00:26<01:34, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 22%|##2 | 27/120 [00:27<01:33, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 23%|##3 | 28/120 [00:28<01:32, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 24%|##4 | 29/120 [00:29<01:31, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 25%|##5 | 30/120 [00:30<01:30, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 26%|##5 | 31/120 [00:31<01:29, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 27%|##6 | 32/120 [00:32<01:28, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 28%|##7 | 33/120 [00:33<01:27, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 28%|##8 | 34/120 [00:34<01:26, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 29%|##9 | 35/120 [00:35<01:25, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 30%|### | 36/120 [00:36<01:24, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 31%|### | 37/120 [00:37<01:23, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 32%|###1 | 38/120 [00:38<01:22, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 32%|###2 | 39/120 [00:39<01:21, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 33%|###3 | 40/120 [00:40<01:20, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 34%|###4 | 41/120 [00:41<01:19, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 35%|###5 | 42/120 [00:42<01:18, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 36%|###5 | 43/120 [00:43<01:17, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 37%|###6 | 44/120 [00:44<01:16, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 38%|###7 | 45/120 [00:45<01:15, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 38%|###8 | 46/120 [00:46<01:14, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 39%|###9 | 47/120 [00:47<01:13, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 40%|#### | 48/120 [00:48<01:12, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 41%|#### | 49/120 [00:49<01:11, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 42%|####1 | 50/120 [00:50<01:10, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 42%|####2 | 51/120 [00:51<01:09, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 43%|####3 | 52/120 [00:52<01:08, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 44%|####4 | 53/120 [00:53<01:07, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 45%|####5 | 54/120 [00:54<01:06, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 46%|####5 | 55/120 [00:55<01:05, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 47%|####6 | 56/120 [00:56<01:04, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 48%|####7 | 57/120 [00:57<01:03, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 48%|####8 | 58/120 [00:58<01:02, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 49%|####9 | 59/120 [00:59<01:01, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 50%|##### | 60/120 [01:00<01:00, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 51%|##### | 61/120 [01:01<00:59, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 52%|#####1 | 62/120 [01:02<00:58, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 52%|#####2 | 63/120 [01:03<00:57, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 53%|#####3 | 64/120 [01:04<00:56, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 54%|#####4 | 65/120 [01:05<00:55, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 55%|#####5 | 66/120 [01:06<00:54, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 56%|#####5 | 67/120 [01:07<00:53, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 57%|#####6 | 68/120 [01:08<00:52, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 57%|#####7 | 69/120 [01:09<00:51, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 58%|#####8 | 70/120 [01:10<00:50, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 59%|#####9 | 71/120 [01:11<00:49, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 60%|###### | 72/120 [01:12<00:48, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 61%|###### | 73/120 [01:13<00:47, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 62%|######1 | 74/120 [01:14<00:46, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 62%|######2 | 75/120 [01:15<00:45, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 63%|######3 | 76/120 [01:16<00:44, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 64%|######4 | 77/120 [01:17<00:43, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 65%|######5 | 78/120 [01:18<00:42, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 66%|######5 | 79/120 [01:19<00:41, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 67%|######6 | 80/120 [01:20<00:40, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 68%|######7 | 81/120 [01:21<00:39, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 68%|######8 | 82/120 [01:22<00:38, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 69%|######9 | 83/120 [01:23<00:37, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 70%|####### | 84/120 [01:24<00:36, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 71%|####### | 85/120 [01:25<00:35, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 72%|#######1 | 86/120 [01:26<00:34, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 72%|#######2 | 87/120 [01:27<00:33, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 73%|#######3 | 88/120 [01:28<00:32, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 74%|#######4 | 89/120 [01:29<00:31, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 75%|#######5 | 90/120 [01:30<00:30, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 76%|#######5 | 91/120 [01:31<00:29, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 77%|#######6 | 92/120 [01:32<00:28, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 78%|#######7 | 93/120 [01:33<00:27, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 78%|#######8 | 94/120 [01:34<00:26, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 79%|#######9 | 95/120 [01:35<00:25, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 80%|######## | 96/120 [01:36<00:24, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 81%|######## | 97/120 [01:37<00:23, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 82%|########1 | 98/120 [01:38<00:22, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 82%|########2 | 99/120 [01:39<00:21, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 83%|########3 | 100/120 [01:40<00:20, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 84%|########4 | 101/120 [01:41<00:19, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 85%|########5 | 102/120 [01:42<00:18, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 86%|########5 | 103/120 [01:43<00:17, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 87%|########6 | 104/120 [01:44<00:16, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 88%|########7 | 105/120 [01:45<00:15, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 88%|########8 | 106/120 [01:46<00:14, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 89%|########9 | 107/120 [01:47<00:13, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 90%|######### | 108/120 [01:48<00:12, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 91%|######### | 109/120 [01:49<00:11, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 92%|#########1| 110/120 [01:50<00:10, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 92%|#########2| 111/120 [01:51<00:09, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 93%|#########3| 112/120 [01:52<00:08, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 94%|#########4| 113/120 [01:53<00:07, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [01:53<00:00, 1.06it/s, The total time budget for this task is 0:02:00]
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
per_run_time_limit=30, time_left_for_this_task=120)
Get the Score of the final ensemble¶
predictions = cls.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score 0.8608695652173913
Plot the ensemble performance¶
The performance_over_time_ attribute returns a pandas dataframe, which can be directly used for plotting
poT = cls.performance_over_time_
poT.plot(
x="Timestamp",
kind="line",
legend=True,
title="Auto-sklearn accuracy over time",
grid=True,
)
plt.show()
Total running time of the script: ( 1 minutes 59.974 seconds)