.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/40_advanced/example_pass_feature_types.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_40_advanced_example_pass_feature_types.py: ===================================================== Tabular Classification with user passed feature types ===================================================== The following example shows how to pass feature typesfor datasets which are in numpy format (also works for dataframes and lists) fit a sample classification model with AutoPyTorch. AutoPyTorch relies on column dtypes for intepreting the feature types. But they can be misinterpreted for example, when dataset is passed as a numpy array, all the data is interpreted as numerical if it's dtype is int or float. However, the categorical values could have been encoded as integers. Passing feature types helps AutoPyTorch interpreting them correctly as well as validates the dataset by checking the dtype of the columns for any incompatibilities. .. GENERATED FROM PYTHON SOURCE LINES 18-36 .. code-block:: default import os import tempfile as tmp import warnings os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir() os.environ['OMP_NUM_THREADS'] = '1' os.environ['OPENBLAS_NUM_THREADS'] = '1' os.environ['MKL_NUM_THREADS'] = '1' warnings.simplefilter(action='ignore', category=UserWarning) warnings.simplefilter(action='ignore', category=FutureWarning) import openml import sklearn.model_selection from autoPyTorch.api.tabular_classification import TabularClassificationTask .. GENERATED FROM PYTHON SOURCE LINES 37-39 Data Loading ============ .. GENERATED FROM PYTHON SOURCE LINES 39-54 .. code-block:: default task = openml.tasks.get_task(task_id=146821) dataset = task.get_dataset() X, y, categorical_indicator, _ = dataset.get_data( dataset_format='array', target=dataset.default_target_attribute, ) X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split( X, y, random_state=1, ) feat_types = ["numerical" if not indicator else "categorical" for indicator in categorical_indicator] # .. GENERATED FROM PYTHON SOURCE LINES 55-57 Build and fit a classifier ========================== .. GENERATED FROM PYTHON SOURCE LINES 57-67 .. code-block:: default api = TabularClassificationTask( # To maintain logs of the run, you can uncomment the # Following lines # temporary_directory='./tmp/autoPyTorch_example_tmp_01', # output_directory='./tmp/autoPyTorch_example_out_01', # delete_tmp_folder_after_terminate=False, # delete_output_folder_after_terminate=False, seed=42, ) .. GENERATED FROM PYTHON SOURCE LINES 68-70 Search for an ensemble of machine learning algorithms ===================================================== .. GENERATED FROM PYTHON SOURCE LINES 70-83 .. code-block:: default api.search( X_train=X_train, y_train=y_train, X_test=X_test.copy(), y_test=y_test.copy(), dataset_name='Australian', optimize_metric='accuracy', total_walltime_limit=100, func_eval_time_limit_secs=50, feat_types=feat_types, enable_traditional_pipeline=False ) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 84-86 Print the final ensemble performance ==================================== .. GENERATED FROM PYTHON SOURCE LINES 86-94 .. code-block:: default y_pred = api.predict(X_test) score = api.score(y_pred, y_test) print(score) # Print the final ensemble built by AutoPyTorch print(api.show_models()) # Print statistics from search print(api.sprint_statistics()) .. rst-class:: sphx-glr-script-out .. code-block:: none {'accuracy': 0.9490740740740741} | | Preprocessing | Estimator | Weight | |---:|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:| | 0 | SimpleImputer,Variance Threshold,MinorityCoalescer,NoEncoder,NoScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.74 | | 1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,NoFeaturePreprocessing | embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential | 0.14 | | 2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.12 | autoPyTorch results: Dataset name: Australian Optimisation Metric: accuracy Best validation score: 0.9135514018691588 Number of target algorithm runs: 8 Number of successful target algorithm runs: 7 Number of crashed target algorithm runs: 0 Number of target algorithms that exceeded the time limit: 1 Number of target algorithms that exceeded the memory limit: 0 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 1 minutes 50.498 seconds) .. _sphx_glr_download_examples_40_advanced_example_pass_feature_types.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/automl/Auto-PyTorch/development?urlpath=lab/tree/notebooks/examples/40_advanced/example_pass_feature_types.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: example_pass_feature_types.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: example_pass_feature_types.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_