Interpretable models

The following example shows how to inspect the models which auto-sklearn optimizes over and how to restrict them to an interpretable subset.

import autosklearn.classification
import sklearn.datasets
import sklearn.metrics

Show available classification models

We will first list all classifiers Auto-sklearn chooses from. A similar call is available for preprocessors (see below) and regression (not shown) as well.

from autosklearn.pipeline.components.classification import ClassifierChoice

for name in ClassifierChoice.get_components():
    print(name)

Out:

adaboost
bernoulli_nb
decision_tree
extra_trees
gaussian_nb
gradient_boosting
k_nearest_neighbors
lda
liblinear_svc
libsvm_svc
mlp
multinomial_nb
passive_aggressive
qda
random_forest
sgd

Show available preprocessors

from autosklearn.pipeline.components.feature_preprocessing import FeaturePreprocessorChoice

for name in FeaturePreprocessorChoice.get_components():
    print(name)

Out:

densifier
extra_trees_preproc_for_classification
extra_trees_preproc_for_regression
fast_ica
feature_agglomeration
kernel_pca
kitchen_sinks
liblinear_svc_preprocessor
no_preprocessing
nystroem_sampler
pca
polynomial
random_trees_embedding
select_percentile_classification
select_percentile_regression
select_rates_classification
select_rates_regression
truncatedSVD

Data Loading

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

Build and fit a classifier

We will now only use a subset of the given classifiers and preprocessors. Furthermore, we will restrict the ensemble size to 1 to only use the single best model in the end. However, we would like to note that the choice of which models is deemed interpretable is very much up to the user and can change from use case to use case.

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_interpretable_models_example_tmp',
    include={
        'classifier': [
            'decision_tree', 'lda', 'sgd'
        ],
        'feature_preprocessor': [
            'no_preprocessing', 'polynomial', 'select_percentile_classification'
        ]
    },
    ensemble_size=1,
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')

Out:

AutoSklearnClassifier(ensemble_size=1,
                      include={'classifier': ['decision_tree', 'lda', 'sgd'],
                               'feature_preprocessor': ['no_preprocessing',
                                                        'polynomial',
                                                        'select_percentile_classification']},
                      per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_interpretable_models_example_tmp')

Get the Score of the final ensemble

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Out:

Accuracy score: 0.958041958041958

Total running time of the script: ( 1 minutes 55.571 seconds)

Gallery generated by Sphinx-Gallery