.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/40_advanced/example_single_configuration.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_40_advanced_example_single_configuration.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_40_advanced_example_single_configuration.py:


==========================
Fit a single configuration
==========================

*Auto-sklearn* searches for the best combination of machine learning algorithms
and their hyper-parameter configuration for a given task, using Scikit-Learn Pipelines.
To further improve performance, this pipelines are ensemble together using Ensemble
Selection from Caruana (2004).


This example shows how one can fit one of this pipelines, both, with an user defined
configuration, and a randomly sampled one form the configuration space.

The pipelines that Auto-Sklearn fits are compatible with Scikit-Learn API. You can
get further documentation about Scikit-Learn models here: <https://scikit-learn.org/stable/getting_started.html`>_

.. GENERATED FROM PYTHON SOURCE LINES 19-29

.. code-block:: default

    import numpy as np
    import sklearn.model_selection
    import sklearn.datasets
    import sklearn.metrics

    from ConfigSpace.configuration_space import Configuration

    import autosklearn.classification


.. GENERATED FROM PYTHON SOURCE LINES 30-32

Data Loading
============

.. GENERATED FROM PYTHON SOURCE LINES 32-38

.. code-block:: default


    X, y = sklearn.datasets.fetch_openml(data_id=3, return_X_y=True, as_frame=True)
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
        X, y, test_size=0.5, random_state=3
    )


.. GENERATED FROM PYTHON SOURCE LINES 39-41

Define an estimator
============================

.. GENERATED FROM PYTHON SOURCE LINES 41-53

.. code-block:: default


    cls = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=120,
        per_run_time_limit=60,
        memory_limit=4096,
        # We will limit the configuration space only to
        # have RandomForest as a valid model. We recommend enabling all
        # possible models to get a better performance.
        include={"classifier": ["random_forest"]},
        delete_tmp_folder_after_terminate=False,
    )


.. GENERATED FROM PYTHON SOURCE LINES 54-56

Fit an user provided configuration
==================================

.. GENERATED FROM PYTHON SOURCE LINES 56-95

.. code-block:: default


    # We will create a configuration that has a user defined
    # min_samples_split in the Random Forest. We recommend you to look into
    # how the ConfigSpace package works here:
    # https://automl.github.io/ConfigSpace/master/
    cs = cls.get_configuration_space(X, y, dataset_name="kr-vs-kp")
    config = cs.sample_configuration()
    config._values["classifier:random_forest:min_samples_split"] = 11

    # Make sure that your changed configuration complies with the configuration space
    config.is_valid_configuration()

    pipeline, run_info, run_value = cls.fit_pipeline(
        X=X_train,
        y=y_train,
        dataset_name="kr-vs-kp",
        config=config,
        X_test=X_test,
        y_test=y_test,
    )

    # This object complies with Scikit-Learn Pipeline API.
    # https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
    print(pipeline.named_steps)

    # The fit_pipeline command also returns a named tuple with the pipeline constraints
    print(run_info)

    # The fit_pipeline command also returns a named tuple with train/test performance
    print(run_value)

    # We can make sure that our pipeline configuration was honored as follows
    print("Passed Configuration:", pipeline.config)
    print("Random Forest:", pipeline.named_steps["classifier"].choice.estimator)

    # We can also search for new configurations using the fit() method
    # Any configurations found by Auto-Sklearn -- even the ones created using
    # fit_pipeline() are stored to disk and can be used for Ensemble Selection
    cs = cls.fit(X, y, dataset_name="kr-vs-kp")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
      warnings.warn(
    {'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d3f33b50>, 'balancing': Balancing(random_state=1, strategy='weighting'), 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d24aed00>, 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d3f6f6d0>}
    RunInfo(config=Configuration(values={
      'balancing:strategy': 'weighting',
      'classifier:__choice__': 'random_forest',
      'classifier:random_forest:bootstrap': 'True',
      'classifier:random_forest:criterion': 'gini',
      'classifier:random_forest:max_depth': 'None',
      'classifier:random_forest:max_features': 0.9678506216566037,
      'classifier:random_forest:max_leaf_nodes': 'None',
      'classifier:random_forest:min_impurity_decrease': 0.0,
      'classifier:random_forest:min_samples_leaf': 4,
      'classifier:random_forest:min_samples_split': 11,
      'classifier:random_forest:min_weight_fraction_leaf': 0.0,
      'data_preprocessor:__choice__': 'feature_type',
      'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
      'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
      'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
      'feature_preprocessor:__choice__': 'fast_ica',
      'feature_preprocessor:fast_ica:algorithm': 'deflation',
      'feature_preprocessor:fast_ica:fun': 'cube',
      'feature_preprocessor:fast_ica:whiten': 'False',
    })
    , instance=None, instance_specific=None, seed=1, cutoff=60, capped=False, budget=0.0, source_id=0)
    RunValue(cost=0.06161137440758291, time=32.43082547187805, status=<StatusType.SUCCESS: 1>, starttime=1663663980.6781428, endtime=1663664013.1358533, additional_info={'duration': 32.330575466156006, 'num_run': 2, 'train_loss': 0.004670714619336769, 'configuration_origin': None})
    Passed Configuration: Configuration(values={
      'balancing:strategy': 'weighting',
      'classifier:__choice__': 'random_forest',
      'classifier:random_forest:bootstrap': 'True',
      'classifier:random_forest:criterion': 'gini',
      'classifier:random_forest:max_depth': 'None',
      'classifier:random_forest:max_features': 0.9678506216566037,
      'classifier:random_forest:max_leaf_nodes': 'None',
      'classifier:random_forest:min_impurity_decrease': 0.0,
      'classifier:random_forest:min_samples_leaf': 4,
      'classifier:random_forest:min_samples_split': 11,
      'classifier:random_forest:min_weight_fraction_leaf': 0.0,
      'data_preprocessor:__choice__': 'feature_type',
      'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
      'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
      'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
      'feature_preprocessor:__choice__': 'fast_ica',
      'feature_preprocessor:fast_ica:algorithm': 'deflation',
      'feature_preprocessor:fast_ica:fun': 'cube',
      'feature_preprocessor:fast_ica:whiten': 'False',
    })

    Random Forest: RandomForestClassifier(max_features=62, min_samples_leaf=4,
                           min_samples_split=11, n_estimators=512, n_jobs=1,
                           random_state=1, warm_start=True)
    /home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
      warnings.warn(


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 2 minutes  46.876 seconds)


.. _sphx_glr_download_examples_40_advanced_example_single_configuration.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/automl/auto-sklearn/master?urlpath=lab/tree/notebooks/examples/40_advanced/example_single_configuration.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: example_single_configuration.py <example_single_configuration.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: example_single_configuration.ipynb <example_single_configuration.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_