Note

Click here to download the full example code or to run this example in your browser via Binder

Tabular Classification with Greedy Portfolio¶

The following example shows how to fit a sample classification model with AutoPyTorch using the greedy portfolio

import os
import tempfile as tmp
import warnings

os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

import sklearn.datasets
import sklearn.model_selection

from autoPyTorch.api.tabular_classification import TabularClassificationTask

Data Loading¶

X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X,
    y,
    random_state=42,
)

Build and fit a classifier¶

api = TabularClassificationTask(
    seed=42,
)

Search for an ensemble of machine learning algorithms¶

api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test.copy(),
    y_test=y_test.copy(),
    optimize_metric='accuracy',
    total_walltime_limit=300,
    func_eval_time_limit_secs=50,
    # Setting this option to "greedy"
    # will make smac run the configurations
    # present in 'autoPyTorch/configs/greedy_portfolio.json'
    portfolio_selection="greedy"
)

<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f9aa6377ca0>

Print the final ensemble performance¶

y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print(score)
# Print the final ensemble built by AutoPyTorch
print(api.show_models())

# Print statistics from search
print(api.sprint_statistics())

{'accuracy': 0.8786127167630058}
|    | Preprocessing                                                                                    | Estimator                                                          |   Weight |
|---:|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|---------:|
|  0 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,MinMaxScaler,PolynomialFeatures | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.32 |
|  1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,KernelPCA                    | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential          |     0.16 |
|  2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential    |     0.14 |
|  3 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,MinMaxScaler,FastICA                  | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential          |     0.12 |
|  4 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.08 |
|  5 | None                                                                                             | CBLearner                                                          |     0.06 |
|  6 | None                                                                                             | ETLearner                                                          |     0.06 |
|  7 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.02 |
|  8 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential |     0.02 |
|  9 | None                                                                                             | RFLearner                                                          |     0.02 |
autoPyTorch results:
        Dataset name: 55bb3950-22f3-11ed-8835-b1fa420cf160
        Optimisation Metric: accuracy
        Best validation score: 0.8830409356725146
        Number of target algorithm runs: 25
        Number of successful target algorithm runs: 23
        Number of crashed target algorithm runs: 0
        Number of target algorithms that exceeded the time limit: 2
        Number of target algorithms that exceeded the memory limit: 0

Total running time of the script: ( 5 minutes 21.979 seconds)

Gallery generated by Sphinx-Gallery