Note
Click here to download the full example code or to run this example in your browser via Binder
Tabular Classification with Custom Configuration Space¶
The following example shows how adjust the configuration space of the search. Currently, there are two changes that can be made to the space:-
Adjust individual hyperparameters in the pipeline
- Include or exclude components:
- include: Dictionary containing components to include. Key is the node
name and Value is an Iterable of the names of the components to include. Only these components will be present in the search space.
- exclude: Dictionary containing components to exclude. Key is the node
name and Value is an Iterable of the names of the components to exclude. All except these components will be present in the search space.
import os
import tempfile as tmp
import warnings
os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
import sklearn.datasets
import sklearn.model_selection
from autoPyTorch.api.tabular_classification import TabularClassificationTask
from autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates
def get_search_space_updates():
"""
Search space updates to the task can be added using HyperparameterSearchSpaceUpdates
Returns:
HyperparameterSearchSpaceUpdates
"""
updates = HyperparameterSearchSpaceUpdates()
updates.append(node_name="data_loader",
hyperparameter="batch_size",
value_range=[16, 512],
default_value=32)
updates.append(node_name="lr_scheduler",
hyperparameter="CosineAnnealingLR:T_max",
value_range=[50, 60],
default_value=55)
updates.append(node_name='network_backbone',
hyperparameter='ResNetBackbone:dropout',
value_range=[0, 0.5],
default_value=0.2)
return updates
Data Loading¶
X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X,
y,
random_state=1,
)
Build and fit a classifier with include components¶
api = TabularClassificationTask(
search_space_updates=get_search_space_updates(),
include_components={'network_backbone': ['MLPBackbone', 'ResNetBackbone'],
'encoder': ['OneHotEncoder']}
)
Search for an ensemble of machine learning algorithms¶
api.search(
X_train=X_train.copy(),
y_train=y_train.copy(),
X_test=X_test.copy(),
y_test=y_test.copy(),
optimize_metric='accuracy',
total_walltime_limit=150,
func_eval_time_limit_secs=30
)
<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f6e0e590700>
Print the final ensemble performance¶
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print(score)
print(api.show_models())
# Print statistics from search
print(api.sprint_statistics())
{'accuracy': 0.8497109826589595}
| | Preprocessing | Estimator | Weight |
|---:|:-------------------------------------------------------------------------------------------------|:----------------------------------------------------------|---------:|
| 0 | None | LGBMLearner | 0.24 |
| 1 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,NoScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.18 |
| 2 | None | RFLearner | 0.16 |
| 3 | None | ETLearner | 0.12 |
| 4 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.1 |
| 5 | None | KNNLearner | 0.06 |
| 6 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,LinearSVC Preprocessor | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.04 |
| 7 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.04 |
| 8 | SimpleImputer,Variance Threshold,MinorityCoalescer,OneHotEncoder,NoScaler,PCA | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
| 9 | None | SVMLearner | 0.02 |
| 10 | SimpleImputer,Variance Threshold,NoCoalescer,OneHotEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,MLPBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
autoPyTorch results:
Dataset name: 5e06805c-2307-11ed-884d-557eb8b24584
Optimisation Metric: accuracy
Best validation score: 0.8596491228070176
Number of target algorithm runs: 18
Number of successful target algorithm runs: 15
Number of crashed target algorithm runs: 2
Number of target algorithms that exceeded the time limit: 1
Number of target algorithms that exceeded the memory limit: 0
Build and fit a classifier with exclude components¶
api = TabularClassificationTask(
search_space_updates=get_search_space_updates(),
exclude_components={'network_backbone': ['MLPBackbone'],
'encoder': ['OneHotEncoder']}
)
Search for an ensemble of machine learning algorithms¶
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test.copy(),
y_test=y_test.copy(),
optimize_metric='accuracy',
total_walltime_limit=150,
func_eval_time_limit_secs=30
)
<autoPyTorch.api.tabular_classification.TabularClassificationTask object at 0x7f6e0df0be80>
Print the final ensemble performance¶
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print(score)
print(api.show_models())
# Print statistics from search
print(api.sprint_statistics())
{'accuracy': 0.8670520231213873}
| | Preprocessing | Estimator | Weight |
|---:|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------|---------:|
| 0 | None | LGBMLearner | 0.32 |
| 1 | None | SVMLearner | 0.28 |
| 2 | None | RFLearner | 0.26 |
| 3 | None | ETLearner | 0.1 |
| 4 | SimpleImputer,Variance Threshold,NoCoalescer,NoEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
| 5 | SimpleImputer,Variance Threshold,NoCoalescer,NoEncoder,StandardScaler,NoFeaturePreprocessing | no embedding,ShapedMLPBackbone,FullyConnectedHead,nn.Sequential | 0.02 |
autoPyTorch results:
Dataset name: caae602a-2307-11ed-884d-557eb8b24584
Optimisation Metric: accuracy
Best validation score: 0.8596491228070176
Number of target algorithm runs: 17
Number of successful target algorithm runs: 13
Number of crashed target algorithm runs: 3
Number of target algorithms that exceeded the time limit: 1
Number of target algorithms that exceeded the memory limit: 0
Total running time of the script: ( 6 minutes 4.302 seconds)