.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/40_advanced/example_pass_feature_types.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_40_advanced_example_pass_feature_types.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_40_advanced_example_pass_feature_types.py:


=====================================================
Tabular Classification with user passed feature types
=====================================================

The following example shows how to pass feature typesfor datasets which are in 
numpy format (also works for dataframes and lists) fit a sample classification 
model with AutoPyTorch.

AutoPyTorch relies on column dtypes for intepreting the feature types. But they 
can be misinterpreted for example, when dataset is passed as a numpy array, all 
the data is interpreted as numerical if it's dtype is int or float. However, the 
categorical values could have been encoded as integers.

Passing feature types helps AutoPyTorch interpreting them correctly as well as
validates the dataset by checking the dtype of the columns for any incompatibilities.

.. GENERATED FROM PYTHON SOURCE LINES 18-36

.. code-block:: default

    import os
    import tempfile as tmp
    import warnings

    os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
    os.environ['OMP_NUM_THREADS'] = '1'
    os.environ['OPENBLAS_NUM_THREADS'] = '1'
    os.environ['MKL_NUM_THREADS'] = '1'

    warnings.simplefilter(action='ignore', category=UserWarning)
    warnings.simplefilter(action='ignore', category=FutureWarning)

    import openml
    import sklearn.model_selection

    from autoPyTorch.api.tabular_classification import TabularClassificationTask


.. GENERATED FROM PYTHON SOURCE LINES 37-39

Data Loading
============

.. GENERATED FROM PYTHON SOURCE LINES 39-54

.. code-block:: default

    task = openml.tasks.get_task(task_id=146821)
    dataset = task.get_dataset()
    X, y, categorical_indicator, _ = dataset.get_data(
        dataset_format='array',
        target=dataset.default_target_attribute,
    )
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
        X,
        y,
        random_state=1,
    )

    feat_types = ["numerical" if not indicator else "categorical" for indicator in categorical_indicator]

    # 


.. GENERATED FROM PYTHON SOURCE LINES 55-57

Build and fit a classifier
==========================

.. GENERATED FROM PYTHON SOURCE LINES 57-67

.. code-block:: default

    api = TabularClassificationTask(
        # To maintain logs of the run, you can uncomment the
        # Following lines
        # temporary_directory='./tmp/autoPyTorch_example_tmp_01',
        # output_directory='./tmp/autoPyTorch_example_out_01',
        # delete_tmp_folder_after_terminate=False,
        # delete_output_folder_after_terminate=False,
        seed=42,
    )


.. GENERATED FROM PYTHON SOURCE LINES 68-70

Search for an ensemble of machine learning algorithms
=====================================================

.. GENERATED FROM PYTHON SOURCE LINES 70-83

.. code-block:: default

    api.search(
        X_train=X_train,
        y_train=y_train,
        X_test=X_test.copy(),
        y_test=y_test.copy(),
        dataset_name='Australian',
        optimize_metric='accuracy',
        total_walltime_limit=100,
        func_eval_time_limit_secs=50,
        feat_types=feat_types,
        enable_traditional_pipeline=False
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f9aa468d760>


.. GENERATED FROM PYTHON SOURCE LINES 84-86

Print the final ensemble performance
====================================

.. GENERATED FROM PYTHON SOURCE LINES 86-94

.. code-block:: default

    y_pred = api.predict(X_test)
    score = api.score(y_pred, y_test)
    print(score)
    # Print the final ensemble built by AutoPyTorch
    print(api.show_models())

    # Print statistics from search
    print(api.sprint_statistics())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    {'accuracy': 0.9490740740740741}
    |    | Preprocessing                                                                                | Estimator                                                       |   Weight |
    |---:|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:|
    |  0 | SimpleImputer,Variance Threshold,MinorityCoalescer,NoEncoder,NoScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential       |     0.74 |
    |  1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,NoFeaturePreprocessing   | embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential       |     0.14 |
    |  2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,NoFeaturePreprocessing   | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential |     0.12 |
    autoPyTorch results:
            Dataset name: Australian
            Optimisation Metric: accuracy
            Best validation score: 0.9135514018691588
            Number of target algorithm runs: 8
            Number of successful target algorithm runs: 7
            Number of crashed target algorithm runs: 0
            Number of target algorithms that exceeded the time limit: 1
            Number of target algorithms that exceeded the memory limit: 0


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 1 minutes  50.498 seconds)


.. _sphx_glr_download_examples_40_advanced_example_pass_feature_types.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/automl/Auto-PyTorch/development?urlpath=lab/tree/notebooks/examples/40_advanced/example_pass_feature_types.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: example_pass_feature_types.py <example_pass_feature_types.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: example_pass_feature_types.ipynb <example_pass_feature_types.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_