.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/40_advanced/example_resampling_strategy.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_40_advanced_example_resampling_strategy.py: ====================== Tabular Classification with different resampling strategy ====================== The following example shows how to fit a sample classification model with different resampling strategies in AutoPyTorch By default, AutoPyTorch uses Holdout Validation with a 67% train size split. .. GENERATED FROM PYTHON SOURCE LINES 11-29 .. code-block:: default import os import tempfile as tmp import warnings os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir() os.environ['OMP_NUM_THREADS'] = '1' os.environ['OPENBLAS_NUM_THREADS'] = '1' os.environ['MKL_NUM_THREADS'] = '1' warnings.simplefilter(action='ignore', category=UserWarning) warnings.simplefilter(action='ignore', category=FutureWarning) import sklearn.datasets import sklearn.model_selection from autoPyTorch.api.tabular_classification import TabularClassificationTask from autoPyTorch.datasets.resampling_strategy import CrossValTypes, HoldoutValTypes .. GENERATED FROM PYTHON SOURCE LINES 30-32 Default Resampling Strategy ============================ .. GENERATED FROM PYTHON SOURCE LINES 34-36 Data Loading ------------ .. GENERATED FROM PYTHON SOURCE LINES 36-43 .. code-block:: default X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True) X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split( X, y, random_state=1, ) .. GENERATED FROM PYTHON SOURCE LINES 44-46 Build and fit a classifier with default resampling strategy ----------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 46-55 .. code-block:: default api = TabularClassificationTask( # 'HoldoutValTypes.holdout_validation' with 'val_share': 0.33 # is the default argument setting for TabularClassificationTask. # It is explicitly specified in this example for demonstrational # purpose. resampling_strategy=HoldoutValTypes.holdout_validation, resampling_strategy_args={'val_share': 0.33} ) .. GENERATED FROM PYTHON SOURCE LINES 56-58 Search for an ensemble of machine learning algorithms ----------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 58-68 .. code-block:: default api.search( X_train=X_train, y_train=y_train, X_test=X_test.copy(), y_test=y_test.copy(), optimize_metric='accuracy', total_walltime_limit=150, func_eval_time_limit_secs=30 ) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 69-71 Print the final ensemble performance ------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 71-80 .. code-block:: default y_pred = api.predict(X_test) score = api.score(y_pred, y_test) print(score) # Print the final ensemble built by AutoPyTorch print(api.show_models()) # Print statistics from search print(api.sprint_statistics()) .. rst-class:: sphx-glr-script-out .. code-block:: none {'accuracy': 0.8554913294797688} | | Preprocessing | Estimator | Weight | |---:|:-------------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:| | 0 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,LinearSVC Preprocessor | embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.28 | | 1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.28 | | 2 | SimpleImputer,Variance Threshold,NoCoalescer,NoEncoder,StandardScaler,SPC | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.24 | | 3 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.18 | | 4 | None | KNNLearner | 0.02 | autoPyTorch results: Dataset name: 5aca1730-22f6-11ed-8835-b1fa420cf160 Optimisation Metric: accuracy Best validation score: 0.8713450292397661 Number of target algorithm runs: 20 Number of successful target algorithm runs: 15 Number of crashed target algorithm runs: 4 Number of target algorithms that exceeded the time limit: 1 Number of target algorithms that exceeded the memory limit: 0 .. GENERATED FROM PYTHON SOURCE LINES 83-85 Cross validation Resampling Strategy ===================================== .. GENERATED FROM PYTHON SOURCE LINES 87-89 Build and fit a classifier with Cross validation resampling strategy -------------------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 89-94 .. code-block:: default api = TabularClassificationTask( resampling_strategy=CrossValTypes.k_fold_cross_validation, resampling_strategy_args={'num_splits': 3} ) .. GENERATED FROM PYTHON SOURCE LINES 95-97 Search for an ensemble of machine learning algorithms ----------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 97-108 .. code-block:: default api.search( X_train=X_train, y_train=y_train, X_test=X_test.copy(), y_test=y_test.copy(), optimize_metric='accuracy', total_walltime_limit=150, func_eval_time_limit_secs=30 ) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 109-111 Print the final ensemble performance ------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 111-120 .. code-block:: default y_pred = api.predict(X_test) score = api.score(y_pred, y_test) print(score) # Print the final ensemble built by AutoPyTorch print(api.show_models()) # Print statistics from search print(api.sprint_statistics()) .. rst-class:: sphx-glr-script-out .. code-block:: none {'accuracy': 0.8728323699421965} | | Preprocessing | Estimator | Weight | |---:|:-----------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:| | 0 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,LinearSVC Preprocessor | embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.56 | | 1 | None | TabularTraditionalModel | 0.16 | | 2 | None | TabularTraditionalModel | 0.12 | | 3 | None | TabularTraditionalModel | 0.08 | | 4 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,QuantileTransformer,TruncSVD | no embedding,ResNetBackbone,FullyConnectedHead,nn.Sequential | 0.04 | | 5 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,MinMaxScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.04 | autoPyTorch results: Dataset name: c3af43a2-22f6-11ed-8835-b1fa420cf160 Optimisation Metric: accuracy Best validation score: 0.8626733083495604 Number of target algorithm runs: 15 Number of successful target algorithm runs: 11 Number of crashed target algorithm runs: 4 Number of target algorithms that exceeded the time limit: 0 Number of target algorithms that exceeded the memory limit: 0 .. GENERATED FROM PYTHON SOURCE LINES 123-125 Stratified Resampling Strategy =============================== .. GENERATED FROM PYTHON SOURCE LINES 127-129 Build and fit a classifier with Stratified resampling strategy -------------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 129-137 .. code-block:: default api = TabularClassificationTask( # For demonstration purposes, we use # Stratified hold out validation. However, # one can also use CrossValTypes.stratified_k_fold_cross_validation. resampling_strategy=HoldoutValTypes.stratified_holdout_validation, resampling_strategy_args={'val_share': 0.33} ) .. GENERATED FROM PYTHON SOURCE LINES 138-140 Search for an ensemble of machine learning algorithms ----------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 140-150 .. code-block:: default api.search( X_train=X_train, y_train=y_train, X_test=X_test.copy(), y_test=y_test.copy(), optimize_metric='accuracy', total_walltime_limit=150, func_eval_time_limit_secs=30 ) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 151-153 Print the final ensemble performance ==================================== .. GENERATED FROM PYTHON SOURCE LINES 153-161 .. code-block:: default y_pred = api.predict(X_test) score = api.score(y_pred, y_test) print(score) # Print the final ensemble built by AutoPyTorch print(api.show_models()) # Print statistics from search print(api.sprint_statistics()) .. rst-class:: sphx-glr-script-out .. code-block:: none {'accuracy': 0.8670520231213873} | | Preprocessing | Estimator | Weight | |---:|:-------------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:| | 0 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,LinearSVC Preprocessor | embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.62 | | 1 | None | RFLearner | 0.14 | | 2 | None | KNNLearner | 0.12 | | 3 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.06 | | 4 | None | SVMLearner | 0.04 | | 5 | None | LGBMLearner | 0.02 | autoPyTorch results: Dataset name: 2b5c3792-22f7-11ed-8835-b1fa420cf160 Optimisation Metric: accuracy Best validation score: 0.8362573099415205 Number of target algorithm runs: 17 Number of successful target algorithm runs: 13 Number of crashed target algorithm runs: 3 Number of target algorithms that exceeded the time limit: 1 Number of target algorithms that exceeded the memory limit: 0 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 8 minutes 43.593 seconds) .. _sphx_glr_download_examples_40_advanced_example_resampling_strategy.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/automl/Auto-PyTorch/development?urlpath=lab/tree/notebooks/examples/40_advanced/example_resampling_strategy.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: example_resampling_strategy.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: example_resampling_strategy.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_