Note

Click here to download the full example code or to run this example in your browser via Binder

Interpretable models¶

The following example shows how to inspect the models which auto-sklearn optimizes over and how to restrict them to an interpretable subset.

from pprint import pprint

import autosklearn.classification
import sklearn.datasets
import sklearn.metrics

Show available classification models¶

We will first list all classifiers Auto-sklearn chooses from. A similar call is available for preprocessors (see below) and regression (not shown) as well.

from autosklearn.pipeline.components.classification import ClassifierChoice

for name in ClassifierChoice.get_components():
    print(name)

adaboost
bernoulli_nb
decision_tree
extra_trees
gaussian_nb
gradient_boosting
k_nearest_neighbors
lda
liblinear_svc
libsvm_svc
mlp
multinomial_nb
passive_aggressive
qda
random_forest
sgd

Show available preprocessors¶

from autosklearn.pipeline.components.feature_preprocessing import (
    FeaturePreprocessorChoice,
)

for name in FeaturePreprocessorChoice.get_components():
    print(name)

densifier
extra_trees_preproc_for_classification
extra_trees_preproc_for_regression
fast_ica
feature_agglomeration
kernel_pca
kitchen_sinks
liblinear_svc_preprocessor
no_preprocessing
nystroem_sampler
pca
polynomial
random_trees_embedding
select_percentile_classification
select_percentile_regression
select_rates_classification
select_rates_regression
truncatedSVD

Data Loading¶

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

Build and fit a classifier¶

We will now only use a subset of the given classifiers and preprocessors. Furthermore, we will restrict the ensemble size to 1 to only use the single best model in the end. However, we would like to note that the choice of which models is deemed interpretable is very much up to the user and can change from use case to use case.

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder="/tmp/autosklearn_interpretable_models_example_tmp",
    include={
        "classifier": ["decision_tree", "lda", "sgd"],
        "feature_preprocessor": [
            "no_preprocessing",
            "polynomial",
            "select_percentile_classification",
        ],
    },
    ensemble_kwargs={"ensemble_size": 1},
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")

AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      ensemble_kwargs={'ensemble_size': 1},
                      include={'classifier': ['decision_tree', 'lda', 'sgd'],
                               'feature_preprocessor': ['no_preprocessing',
                                                        'polynomial',
                                                        'select_percentile_classification']},
                      per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_interpretable_models_example_tmp')

Print the final ensemble constructed by auto-sklearn¶

pprint(automl.show_models(), indent=4)

{   28: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d0c22910>,
            'cost': 0.007092198581560294,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d6264250>,
            'ensemble_weight': 1.0,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d0c22490>,
            'model_id': 28,
            'rank': 1,
            'sklearn_classifier': SGDClassifier(alpha=0.0003272354910051561, average=True,
              eta0=2.9976399065090562e-05, l1_ratio=0.14999999999999974,
              learning_rate='invscaling', loss='squared_hinge', max_iter=1024,
              penalty='elasticnet', power_t=0.5037491320052959, random_state=1,
              tol=2.59922433981394e-05, warm_start=True)}}

Get the Score of the final ensemble¶

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9440559440559441

Total running time of the script: ( 1 minutes 54.458 seconds)

Gallery generated by Sphinx-Gallery