Extending auto-sklearn

auto-sklearn can be easily extended with new classification, regression and feature preprocessing methods. In order to do so, a user has to implement a wrapper class and register it to auto-sklearn. This manual will walk you through the process.

Writing a component

Depending on the purpose, the component has to be a subclass of one of the following base classes:

In general, these classes are wrappers around existing machine learning models and only add the functionality auto-sklearn needs. Of course you can also implement a machine learning algorithm directly inside a component.

Each component has to implement a method which returns its configuration space, a method for querying properties of the component and methods like fit(), predict() or transform() based on the task of the component. These are described in the subsections get_hyperparameter_search_space() and get_properties()

After writing a component class, you have to tell auto-sklearn about its existence. You have to add it with the following function calls, depending on the type of component:

autosklearn.pipeline.components.classification.add_classifier(classifier: Type[autosklearn.pipeline.components.base.AutoSklearnClassificationAlgorithm]) None[source]
autosklearn.pipeline.components.regression.add_regressor(regressor: Type[autosklearn.pipeline.components.base.AutoSklearnRegressionAlgorithm]) None[source]
autosklearn.pipeline.components.feature_preprocessing.add_preprocessor(preprocessor: Type[autosklearn.pipeline.components.base.AutoSklearnPreprocessingAlgorithm]) None[source]

get_hyperparameter_search_space()

Return an instance of ConfigSpace.configuration_space.ConfigurationSpace.

See also the abstract definitions: AutoSklearnClassificationAlgorithm.get_hyperparameter_search_space() AutoSklearnRegressionAlgorithm.get_hyperparameter_search_space() AutoSklearnPreprocessingAlgorithm.get_hyperparameter_search_space()

To find out about how to create a ConfigurationSpace-object, please look at the source code on github.com.

get_properties()

Return a dictionary which defines how the component can be used when constructing a machine learning pipeline. The following fields must be specified:

  • shortnamestr

    an abbreviation of the component

  • namestr

    the full name of the component

  • handles_regressionbool

    whether the component can handle regression data

  • handles_classificationbool

    whether the component can handle classification data

  • handles_multiclassbool

    whether the component can handle multiclass classification data

  • handles_multilabelbool

    whether the component can multilabel classification data

  • is_deterministicbool

    whether the component gives the same result when using several times, but with the same random seed

  • inputtuple

    type of input data the component can handle, can have multiple values:

    • autosklearn.constants.DENSE

      dense data arrays, mutually exclusive with autosklearn.constants.SPARSE

    • autosklearn.constants.SPARSE

      sparse data matrices, mutually exclusive with autosklearn.constants.DENSE

    • autosklearn.constants.UNSIGNED_DATA

      unsigned data array, meaning only positive input, mutually exclusive with autosklearn.constants.SIGNED_DATA

    • autosklearn.constants.SIGNED_DATA

      signed data array, meaning both positive and negative input values, mutually exclusive with autosklearn.constants.UNSIGNED_DATA

  • outputtuple

    type of output data the component produces

    • autosklearn.constants.PREDICTIONS

      predictions, for example by a classifier

    • autosklearn.constants.INPUT

      data in the same form as the input

    • autosklearn.constants.DENSE

      dense data arrays, mutually exclusive with autosklearn.constants.SPARSE. This implies that sparse data will be converted into a dense representation.

    • autosklearn.constants.SPARSE

      sparse data matrices, mutually exclusive with autosklearn.constants.DENSE. This implies that dense data will be converted into a sparse representation

    • autosklearn.constants.UNSIGNED_DATA

      unsigned data array, meaning only positive input, mutually exclusive with autosklearn.constants.SIGNED_DATA. This allows for algorithms which can only work on positive data.

    • autosklearn.constants.SIGNED_DATA

      signed data array, meaning both positive and negative input values, mutually exclusive with autosklearn.constants.UNSIGNED_DATA

Classification

In addition two get_properties() and get_hyperparameter_search_space() you have to implement AutoSklearnClassificationAlgorithm.fit() and AutoSklearnClassificationAlgorithm.predict() . These are an implementation of the scikit-learn predictor API.

Regression

In addition two get_properties() and get_hyperparameter_search_space() you have to implement AutoSklearnRegressionAlgorithm.fit() and AutoSklearnRegressionAlgorithm.predict() . These are an implementation of the scikit-learn predictor API.

Feature Preprocessing

In addition two get_properties() and get_hyperparameter_search_space() you have to implement AutoSklearnPreprocessingAlgorithm.fit() and AutoSklearnPreprocessingAlgorithm.transform() . These are an implementation of the scikit-learn predictor API.