Note
Click here to download the full example code or to run this example in your browser via Binder
Regression¶
The following example shows how to fit a simple regression model with auto-sklearn.
from pprint import pprint
import sklearn.datasets
import sklearn.metrics
import autosklearn.regression
import matplotlib.pyplot as plt
Data Loading¶
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
Build and fit a regressor¶
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=120,
per_run_time_limit=30,
tmp_folder="/tmp/autosklearn_regression_example_tmp",
)
automl.fit(X_train, y_train, dataset_name="diabetes")
Fitting to the training data: 0%| | 0/120 [00:00<?, ?it/s, The total time budget for this task is 0:02:00]
Fitting to the training data: 1%| | 1/120 [00:01<01:59, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|1 | 2/120 [00:02<01:58, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 2%|2 | 3/120 [00:03<01:57, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 3%|3 | 4/120 [00:04<01:56, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 4%|4 | 5/120 [00:05<01:55, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 5%|5 | 6/120 [00:06<01:54, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 6%|5 | 7/120 [00:07<01:53, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 7%|6 | 8/120 [00:08<01:52, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|7 | 9/120 [00:09<01:51, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 8%|8 | 10/120 [00:10<01:50, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 9%|9 | 11/120 [00:11<01:49, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 10%|# | 12/120 [00:12<01:48, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 11%|# | 13/120 [00:13<01:47, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 12%|#1 | 14/120 [00:14<01:46, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 12%|#2 | 15/120 [00:15<01:45, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 13%|#3 | 16/120 [00:16<01:44, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 14%|#4 | 17/120 [00:17<01:43, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 15%|#5 | 18/120 [00:18<01:42, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 16%|#5 | 19/120 [00:19<01:41, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 17%|#6 | 20/120 [00:20<01:40, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 18%|#7 | 21/120 [00:21<01:39, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 18%|#8 | 22/120 [00:22<01:38, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 19%|#9 | 23/120 [00:23<01:37, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 20%|## | 24/120 [00:24<01:36, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 21%|## | 25/120 [00:25<01:35, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 22%|##1 | 26/120 [00:26<01:34, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 22%|##2 | 27/120 [00:27<01:33, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 23%|##3 | 28/120 [00:28<01:32, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 24%|##4 | 29/120 [00:29<01:31, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 25%|##5 | 30/120 [00:30<01:30, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 26%|##5 | 31/120 [00:31<01:29, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 27%|##6 | 32/120 [00:32<01:28, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 28%|##7 | 33/120 [00:33<01:27, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 28%|##8 | 34/120 [00:34<01:26, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 29%|##9 | 35/120 [00:35<01:25, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 30%|### | 36/120 [00:36<01:24, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 31%|### | 37/120 [00:37<01:23, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 32%|###1 | 38/120 [00:38<01:22, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 32%|###2 | 39/120 [00:39<01:21, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 33%|###3 | 40/120 [00:40<01:20, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 34%|###4 | 41/120 [00:41<01:19, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 35%|###5 | 42/120 [00:42<01:18, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 36%|###5 | 43/120 [00:43<01:17, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 37%|###6 | 44/120 [00:44<01:16, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 38%|###7 | 45/120 [00:45<01:15, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 38%|###8 | 46/120 [00:46<01:14, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 39%|###9 | 47/120 [00:47<01:13, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 40%|#### | 48/120 [00:48<01:12, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 41%|#### | 49/120 [00:49<01:11, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 42%|####1 | 50/120 [00:50<01:10, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 42%|####2 | 51/120 [00:51<01:09, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 43%|####3 | 52/120 [00:52<01:08, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 44%|####4 | 53/120 [00:53<01:07, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 45%|####5 | 54/120 [00:54<01:06, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 46%|####5 | 55/120 [00:55<01:05, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 47%|####6 | 56/120 [00:56<01:04, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 48%|####7 | 57/120 [00:57<01:03, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 48%|####8 | 58/120 [00:58<01:02, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 49%|####9 | 59/120 [00:59<01:01, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 50%|##### | 60/120 [01:00<01:00, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 51%|##### | 61/120 [01:01<00:59, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 52%|#####1 | 62/120 [01:02<00:58, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 52%|#####2 | 63/120 [01:03<00:57, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 53%|#####3 | 64/120 [01:04<00:56, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 54%|#####4 | 65/120 [01:05<00:55, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 55%|#####5 | 66/120 [01:06<00:54, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 56%|#####5 | 67/120 [01:07<00:53, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 57%|#####6 | 68/120 [01:08<00:52, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 57%|#####7 | 69/120 [01:09<00:51, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 58%|#####8 | 70/120 [01:10<00:50, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 59%|#####9 | 71/120 [01:11<00:49, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 60%|###### | 72/120 [01:12<00:48, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 61%|###### | 73/120 [01:13<00:47, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 62%|######1 | 74/120 [01:14<00:46, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 62%|######2 | 75/120 [01:15<00:45, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 63%|######3 | 76/120 [01:16<00:44, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 64%|######4 | 77/120 [01:17<00:43, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 65%|######5 | 78/120 [01:18<00:42, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 66%|######5 | 79/120 [01:19<00:41, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 67%|######6 | 80/120 [01:20<00:40, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 68%|######7 | 81/120 [01:21<00:39, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 68%|######8 | 82/120 [01:22<00:38, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 69%|######9 | 83/120 [01:23<00:37, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 70%|####### | 84/120 [01:24<00:36, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 71%|####### | 85/120 [01:25<00:35, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 72%|#######1 | 86/120 [01:26<00:34, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 72%|#######2 | 87/120 [01:27<00:33, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 73%|#######3 | 88/120 [01:28<00:32, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 74%|#######4 | 89/120 [01:29<00:31, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 75%|#######5 | 90/120 [01:30<00:30, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 76%|#######5 | 91/120 [01:31<00:29, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 77%|#######6 | 92/120 [01:32<00:28, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 78%|#######7 | 93/120 [01:33<00:27, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 78%|#######8 | 94/120 [01:34<00:26, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 79%|#######9 | 95/120 [01:35<00:25, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 80%|######## | 96/120 [01:36<00:24, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 81%|######## | 97/120 [01:37<00:23, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 82%|########1 | 98/120 [01:38<00:22, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 82%|########2 | 99/120 [01:39<00:21, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 83%|########3 | 100/120 [01:40<00:20, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 84%|########4 | 101/120 [01:41<00:19, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 85%|########5 | 102/120 [01:42<00:18, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 86%|########5 | 103/120 [01:43<00:17, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 87%|########6 | 104/120 [01:44<00:16, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 88%|########7 | 105/120 [01:45<00:15, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 88%|########8 | 106/120 [01:46<00:14, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 89%|########9 | 107/120 [01:47<00:13, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 90%|######### | 108/120 [01:48<00:12, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 91%|######### | 109/120 [01:49<00:11, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 92%|#########1| 110/120 [01:50<00:10, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 92%|#########2| 111/120 [01:51<00:09, 1.00s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 93%|#########3| 112/120 [01:52<00:08, 1.01s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 94%|#########4| 113/120 [01:53<00:07, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 95%|#########5| 114/120 [01:54<00:06, 1.02s/it, The total time budget for this task is 0:02:00]
Fitting to the training data: 100%|##########| 120/120 [01:54<00:00, 1.05it/s, The total time budget for this task is 0:02:00]
AutoSklearnRegressor(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
per_run_time_limit=30, time_left_for_this_task=120,
tmp_folder='/tmp/autosklearn_regression_example_tmp')
View the models found by auto-sklearn¶
print(automl.leaderboard())
rank ensemble_weight type cost duration
model_id
25 1 0.46 sgd 0.436679 0.749041
6 2 0.32 ard_regression 0.455042 0.733654
27 3 0.14 ard_regression 0.462249 0.773407
11 4 0.02 random_forest 0.507400 9.749653
7 5 0.06 gradient_boosting 0.518673 1.375675
Print the final ensemble constructed by auto-sklearn¶
pprint(automl.show_models(), indent=4)
{ 6: { 'cost': 0.4550418898836528,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2afd4e3370>,
'ensemble_weight': 0.32,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2afcd2f850>,
'model_id': 6,
'rank': 2,
'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f2afcd2f280>,
'sklearn_regressor': ARDRegression(alpha_1=0.0003701926442639788, alpha_2=2.2118001735899097e-07,
copy_X=False, lambda_1=1.2037591637980971e-06,
lambda_2=4.358378124977852e-09,
threshold_lambda=1136.5286041327277, tol=0.021944240404849075)},
7: { 'cost': 0.5186726734789994,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af47ed430>,
'ensemble_weight': 0.06,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af76a2bb0>,
'model_id': 7,
'rank': 14,
'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f2af76a28e0>,
'sklearn_regressor': HistGradientBoostingRegressor(l2_regularization=1.8428972335335263e-10,
learning_rate=0.012607824914758717, max_iter=512,
max_leaf_nodes=10, min_samples_leaf=8,
n_iter_no_change=0, random_state=1,
validation_fraction=None, warm_start=True)},
11: { 'cost': 0.5073997164657239,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2b11083e80>,
'ensemble_weight': 0.02,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af7227220>,
'model_id': 11,
'rank': 11,
'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f2af72270d0>,
'sklearn_regressor': RandomForestRegressor(bootstrap=False, criterion='mae',
max_features=0.6277363920171745, min_samples_leaf=6,
min_samples_split=15, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
25: { 'cost': 0.43667876507897496,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2af6cad2e0>,
'ensemble_weight': 0.46,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2afd3e1730>,
'model_id': 25,
'rank': 1,
'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f2afd3e13a0>,
'sklearn_regressor': SGDRegressor(alpha=0.0006517033225329654, epsilon=0.012150149892783745,
eta0=0.016444224834275295, l1_ratio=1.7462342366289323e-09,
loss='epsilon_insensitive', max_iter=16, penalty='elasticnet',
power_t=0.21521743568582094, random_state=1,
tol=0.002431731981071206, warm_start=True)},
27: { 'cost': 0.4622486119001967,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f2afd67c2e0>,
'ensemble_weight': 0.14,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f2af76f4a30>,
'model_id': 27,
'rank': 7,
'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f2af76f4b80>,
'sklearn_regressor': ARDRegression(alpha_1=2.7664515192592053e-05, alpha_2=9.504988116581138e-07,
copy_X=False, lambda_1=6.50650698230178e-09,
lambda_2=4.238533890074848e-07,
threshold_lambda=78251.58542976103, tol=0.0007301343236220855)}}
Get the Score of the final ensemble¶
After training the estimator, we can now quantify the goodness of fit. One possibility for is the R2 score. The values range between -inf and 1 with 1 being the best possible value. A dummy estimator predicting the data mean has an R2 score of 0.
train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
Train R2 score: 0.5944780427522034
Test R2 score: 0.3959585042866587
Plot the predictions¶
Furthermore, we can now visually inspect the predictions. We plot the true value against the predictions and show results on train and test data. Points on the diagonal depict perfect predictions. Points below the diagonal were overestimated by the model (predicted value is higher than the true value), points above the diagonal were underestimated (predicted value is lower than the true value).
plt.scatter(train_predictions, y_train, label="Train samples", c="#d95f02")
plt.scatter(test_predictions, y_test, label="Test samples", c="#7570b3")
plt.xlabel("Predicted value")
plt.ylabel("True value")
plt.legend()
plt.plot([30, 400], [30, 400], c="k", zorder=0)
plt.xlim([30, 400])
plt.ylim([30, 400])
plt.tight_layout()
plt.show()
Total running time of the script: ( 2 minutes 0.470 seconds)