.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/40_advanced/example_pandas_train_test.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_40_advanced_example_pandas_train_test.py: ========================== Performance-over-time plot ========================== This example shows, how to use the *performance_over_time_* attribute to plot the performance over train time. *performance_over_time_* can contain multiple metrics within a pandas dataframe, namely: - ensemble_optimization_score - ensemble_test_score - single_best_optimization_score - single_best_test_score - single_best_train_score *auto-sklearn* can automatically encode categorical columns using a label/ordinal encoder. This example highlights how to properly set the dtype in a DataFrame for this to happen, and showcase how to input also testing data to autosklearn. The X_train/y_train arguments to the fit function will be used to fit the scikit-learn model, whereas the X_test/y_test will be used to evaluate how good this scikit-learn model generalizes to unseen data (i.e. data not in X_train/y_train). Using test data is a good mechanism to measure if the trained model suffers from overfit, and more details can be found on `evaluating estimator performance `_. In order to provide *\*_test_score* metrics, X_test and y_test must be provided to the AutoML-Model, as shown in this example. There is also support to manually indicate the feature types (whether a column is categorical or numerical) via the argument feat_types from fit(). This is important when working with list or numpy arrays as there is no per-column dtype (further details in the example :ref:`sphx_glr_examples_40_advanced_example_feature_types.py`). .. GENERATED FROM PYTHON SOURCE LINES 33-46 .. code-block:: default import time import matplotlib.pyplot as plt import numpy as np import pandas as pd import sklearn.model_selection import sklearn.datasets import sklearn.metrics from smac.tae import StatusType import autosklearn.classification .. GENERATED FROM PYTHON SOURCE LINES 47-49 Data Loading ============ .. GENERATED FROM PYTHON SOURCE LINES 49-80 .. code-block:: default # Using Australian dataset https://www.openml.org/d/40981. # This example will use the command fetch_openml, which will # download a properly formatted dataframe if you use as_frame=True. # For demonstration purposes, we will download a numpy array using # as_frame=False, and manually creating the pandas DataFrame X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=False) # bool and category will be automatically encoded. # Targets for classification are also automatically encoded # If using fetch_openml, data is already properly encoded, below # is an example for user reference X = pd.DataFrame(data=X, columns=["A" + str(i) for i in range(1, 15)]) desired_boolean_columns = ["A1"] desired_categorical_columns = ["A4", "A5", "A6", "A8", "A9", "A11", "A12"] desired_numerical_columns = ["A2", "A3", "A7", "A10", "A13", "A14"] for column in X.columns: if column in desired_boolean_columns: X[column] = X[column].astype("bool") elif column in desired_categorical_columns: X[column] = X[column].astype("category") else: X[column] = pd.to_numeric(X[column]) y = pd.DataFrame(y, dtype="category") X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split( X, y, test_size=0.5, random_state=3 ) print(X.dtypes) .. rst-class:: sphx-glr-script-out .. code-block:: none A1 bool A2 float64 A3 float64 A4 category A5 category A6 category A7 float64 A8 category A9 category A10 float64 A11 category A12 category A13 float64 A14 float64 dtype: object .. GENERATED FROM PYTHON SOURCE LINES 81-83 Build and fit a classifier ========================== .. GENERATED FROM PYTHON SOURCE LINES 83-90 .. code-block:: default cls = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=120, per_run_time_limit=30, ) cls.fit(X_train, y_train, X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none AutoSklearnClassifier(ensemble_class=, per_run_time_limit=30, time_left_for_this_task=120) .. GENERATED FROM PYTHON SOURCE LINES 91-93 Get the Score of the final ensemble =================================== .. GENERATED FROM PYTHON SOURCE LINES 93-97 .. code-block:: default predictions = cls.predict(X_test) print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions)) .. rst-class:: sphx-glr-script-out .. code-block:: none Accuracy score 0.8666666666666667 .. GENERATED FROM PYTHON SOURCE LINES 98-102 Plot the ensemble performance =================================== The *performance_over_time_* attribute returns a pandas dataframe, which can be directly used for plotting .. GENERATED FROM PYTHON SOURCE LINES 102-112 .. code-block:: default poT = cls.performance_over_time_ poT.plot( x="Timestamp", kind="line", legend=True, title="Auto-sklearn accuracy over time", grid=True, ) plt.show() .. image-sg:: /examples/40_advanced/images/sphx_glr_example_pandas_train_test_001.png :alt: Auto-sklearn accuracy over time :srcset: /examples/40_advanced/images/sphx_glr_example_pandas_train_test_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 2 minutes 2.445 seconds) .. _sphx_glr_download_examples_40_advanced_example_pandas_train_test.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/automl/auto-sklearn/master?urlpath=lab/tree/notebooks/examples/40_advanced/example_pandas_train_test.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: example_pandas_train_test.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: example_pandas_train_test.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_