Note
Click here to download the full example code or to run this example in your browser via Binder
Tabular Classification with Greedy Portfolio¶
The following example shows how to fit a sample classification model with AutoPyTorch using the greedy portfolio
import os
import tempfile as tmp
import warnings
os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
import sklearn.datasets
import sklearn.model_selection
from autoPyTorch.api.tabular_classification import TabularClassificationTask
Data Loading¶
X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X,
y,
random_state=42,
)
Build and fit a classifier¶
api = TabularClassificationTask(
seed=42,
)
Search for an ensemble of machine learning algorithms¶
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test.copy(),
y_test=y_test.copy(),
optimize_metric='accuracy',
total_walltime_limit=300,
func_eval_time_limit_secs=50,
# Setting this option to "greedy"
# will make smac run the configurations
# present in 'autoPyTorch/configs/greedy_portfolio.json'
portfolio_selection="greedy"
)
<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f9aa6377ca0>
Print the final ensemble performance¶
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print(score)
# Print the final ensemble built by AutoPyTorch
print(api.show_models())
# Print statistics from search
print(api.sprint_statistics())
{'accuracy': 0.8786127167630058}
| | Preprocessing | Estimator | Weight |
|---:|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|---------:|
| 0 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,MinMaxScaler,PolynomialFeatures | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.32 |
| 1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,KernelPCA | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.16 |
| 2 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.14 |
| 3 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,MinMaxScaler,FastICA | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.12 |
| 4 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential | 0.08 |
| 5 | None | CBLearner | 0.06 |
| 6 | None | ETLearner | 0.06 |
| 7 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
| 8 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedResNetBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
| 9 | None | RFLearner | 0.02 |
autoPyTorch results:
Dataset name: 55bb3950-22f3-11ed-8835-b1fa420cf160
Optimisation Metric: accuracy
Best validation score: 0.8830409356725146
Number of target algorithm runs: 25
Number of successful target algorithm runs: 23
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 2
Number of target algorithms that exceeded the memory limit: 0
Total running time of the script: ( 5 minutes 21.979 seconds)