The following example shows how to fit a simple regression model with auto-sklearn.

from pprint import pprint

import sklearn.datasets
import sklearn.metrics

import autosklearn.regression
import matplotlib.pyplot as plt

Data Loading

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

Build and fit a regressor

automl = autosklearn.regression.AutoSklearnRegressor(
), y_train, dataset_name='diabetes')


/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/ FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.metafeatures = self.metafeatures.append(metafeatures)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/metalearning/metalearning/ FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

AutoSklearnRegressor(per_run_time_limit=30, time_left_for_this_task=120,

View the models found by auto-sklearn



          rank  ensemble_weight               type      cost  duration
25           1             0.46                sgd  0.436679  0.572129
6            2             0.32     ard_regression  0.455042  0.588294
27           3             0.14     ard_regression  0.462249  0.572937
11           4             0.02      random_forest  0.507400  9.819607
7            5             0.06  gradient_boosting  0.518673  1.036891

Get the Score of the final ensemble

After training the estimator, we can now quantify the goodness of fit. One possibility for is the R2 score. The values range between -inf and 1 with 1 being the best possible value. A dummy estimator predicting the data mean has an R2 score of 0.

train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))


Train R2 score: 0.5944780427522034
Test R2 score: 0.3959585042866587

Plot the predictions

Furthermore, we can now visually inspect the predictions. We plot the true value against the predictions and show results on train and test data. Points on the diagonal depict perfect predictions. Points below the diagonal were overestimated by the model (predicted value is higher than the true value), points above the diagonal were underestimated (predicted value is lower than the true value).

plt.scatter(train_predictions, y_train, label="Train samples", c='#d95f02')
plt.scatter(test_predictions, y_test, label="Test samples", c='#7570b3')
plt.xlabel("Predicted value")
plt.ylabel("True value")
plt.plot([30, 400], [30, 400], c='k', zorder=0)
plt.xlim([30, 400])
plt.ylim([30, 400])
example regression

Total running time of the script: ( 1 minutes 56.523 seconds)

Gallery generated by Sphinx-Gallery