.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/20_basic/example_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_20_basic_example_regression.py: ========== Regression ========== The following example shows how to fit a simple regression model with *auto-sklearn*. .. GENERATED FROM PYTHON SOURCE LINES 10-18 .. code-block:: default from pprint import pprint import sklearn.datasets import sklearn.metrics import autosklearn.regression import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 19-21 Data Loading ============ .. GENERATED FROM PYTHON SOURCE LINES 21-28 .. code-block:: default X, y = sklearn.datasets.load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split( X, y, random_state=1 ) .. GENERATED FROM PYTHON SOURCE LINES 29-31 Build and fit a regressor ========================= .. GENERATED FROM PYTHON SOURCE LINES 31-39 .. code-block:: default automl = autosklearn.regression.AutoSklearnRegressor( time_left_for_this_task=120, per_run_time_limit=30, tmp_folder="/tmp/autosklearn_regression_example_tmp", ) automl.fit(X_train, y_train, dataset_name="diabetes") .. rst-class:: sphx-glr-script-out .. code-block:: none AutoSklearnRegressor(ensemble_class=, per_run_time_limit=30, time_left_for_this_task=120, tmp_folder='/tmp/autosklearn_regression_example_tmp') .. GENERATED FROM PYTHON SOURCE LINES 40-42 View the models found by auto-sklearn ===================================== .. GENERATED FROM PYTHON SOURCE LINES 42-45 .. code-block:: default print(automl.leaderboard()) .. rst-class:: sphx-glr-script-out .. code-block:: none rank ensemble_weight type cost duration model_id 25 1 0.46 sgd 0.436679 0.701417 6 2 0.32 ard_regression 0.455042 0.779423 27 3 0.14 ard_regression 0.462249 0.826378 11 4 0.02 random_forest 0.507400 9.763534 7 5 0.06 gradient_boosting 0.518673 1.450713 .. GENERATED FROM PYTHON SOURCE LINES 46-48 Print the final ensemble constructed by auto-sklearn ==================================================== .. GENERATED FROM PYTHON SOURCE LINES 48-51 .. code-block:: default pprint(automl.show_models(), indent=4) .. rst-class:: sphx-glr-script-out .. code-block:: none { 6: { 'cost': 0.4550418898836528, 'data_preprocessor': , 'ensemble_weight': 0.32, 'feature_preprocessor': , 'model_id': 6, 'rank': 1, 'regressor': , 'sklearn_regressor': ARDRegression(alpha_1=0.0003701926442639788, alpha_2=2.2118001735899097e-07, copy_X=False, lambda_1=1.2037591637980971e-06, lambda_2=4.358378124977852e-09, threshold_lambda=1136.5286041327277, tol=0.021944240404849075)}, 7: { 'cost': 0.5186726734789994, 'data_preprocessor': , 'ensemble_weight': 0.06, 'feature_preprocessor': , 'model_id': 7, 'rank': 2, 'regressor': , 'sklearn_regressor': HistGradientBoostingRegressor(l2_regularization=1.8428972335335263e-10, learning_rate=0.012607824914758717, max_iter=512, max_leaf_nodes=10, min_samples_leaf=8, n_iter_no_change=0, random_state=1, validation_fraction=None, warm_start=True)}, 11: { 'cost': 0.5073997164657239, 'data_preprocessor': , 'ensemble_weight': 0.02, 'feature_preprocessor': , 'model_id': 11, 'rank': 3, 'regressor': , 'sklearn_regressor': RandomForestRegressor(bootstrap=False, criterion='mae', max_features=0.6277363920171745, min_samples_leaf=6, min_samples_split=15, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 25: { 'cost': 0.43667876507897496, 'data_preprocessor': , 'ensemble_weight': 0.46, 'feature_preprocessor': , 'model_id': 25, 'rank': 4, 'regressor': , 'sklearn_regressor': SGDRegressor(alpha=0.0006517033225329654, epsilon=0.012150149892783745, eta0=0.016444224834275295, l1_ratio=1.7462342366289323e-09, loss='epsilon_insensitive', max_iter=16, penalty='elasticnet', power_t=0.21521743568582094, random_state=1, tol=0.002431731981071206, warm_start=True)}, 27: { 'cost': 0.4622486119001967, 'data_preprocessor': , 'ensemble_weight': 0.14, 'feature_preprocessor': , 'model_id': 27, 'rank': 5, 'regressor': , 'sklearn_regressor': ARDRegression(alpha_1=2.7664515192592053e-05, alpha_2=9.504988116581138e-07, copy_X=False, lambda_1=6.50650698230178e-09, lambda_2=4.238533890074848e-07, threshold_lambda=78251.58542976103, tol=0.0007301343236220855)}} .. GENERATED FROM PYTHON SOURCE LINES 52-58 Get the Score of the final ensemble =================================== After training the estimator, we can now quantify the goodness of fit. One possibility for is the `R2 score `_. The values range between -inf and 1 with 1 being the best possible value. A dummy estimator predicting the data mean has an R2 score of 0. .. GENERATED FROM PYTHON SOURCE LINES 58-64 .. code-block:: default train_predictions = automl.predict(X_train) print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions)) test_predictions = automl.predict(X_test) print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions)) .. rst-class:: sphx-glr-script-out .. code-block:: none Train R2 score: 0.5944780427522034 Test R2 score: 0.3959585042866587 .. GENERATED FROM PYTHON SOURCE LINES 65-72 Plot the predictions ==================== Furthermore, we can now visually inspect the predictions. We plot the true value against the predictions and show results on train and test data. Points on the diagonal depict perfect predictions. Points below the diagonal were overestimated by the model (predicted value is higher than the true value), points above the diagonal were underestimated (predicted value is lower than the true value). .. GENERATED FROM PYTHON SOURCE LINES 72-83 .. code-block:: default plt.scatter(train_predictions, y_train, label="Train samples", c="#d95f02") plt.scatter(test_predictions, y_test, label="Test samples", c="#7570b3") plt.xlabel("Predicted value") plt.ylabel("True value") plt.legend() plt.plot([30, 400], [30, 400], c="k", zorder=0) plt.xlim([30, 400]) plt.ylim([30, 400]) plt.tight_layout() plt.show() .. image-sg:: /examples/20_basic/images/sphx_glr_example_regression_001.png :alt: example regression :srcset: /examples/20_basic/images/sphx_glr_example_regression_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 2 minutes 3.004 seconds) .. _sphx_glr_download_examples_20_basic_example_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/automl/auto-sklearn/master?urlpath=lab/tree/notebooks/examples/20_basic/example_regression.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: example_regression.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: example_regression.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_