APIs

Main modules

Classification

class autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=3600, per_run_time_limit=360, initial_configurations_via_metalearning=25, ensemble_size=50, ensemble_nbest=50, seed=1, ml_memory_limit=3072, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, output_folder=None, delete_tmp_folder_after_terminate=True, delete_output_folder_after_terminate=True, shared_mode=False, disable_evaluator_output=False, configuration_mode='SMAC')[source]

This class implements the classification task.

Parameters:

time_left_for_this_task : int, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limit : int, optional (default=360)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearning : int, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_size : int, optional (default=50)

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement.

ensemble_nbest : int, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. Implements Model Library Pruning from Getting the most out of ensemble selection.

seed : int, optional (default=1)

ml_memory_limit : int, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than ml_memory_limit MB.

include_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators to use.

exclude_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

include_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

exclude_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

resampling_strategy : string, optional (‘holdout’)

how to to handle overfitting, might need ‘resampling_strategy_arguments’

  • ‘holdout’: 66:33 (train:test) split
  • ‘holdout-iterative-fit’: 66:33 (train:test) split, calls iterative fit where possible
  • ‘cv’: crossvalidation, requires ‘folds’

resampling_strategy_arguments : dict, optional if ‘holdout’ (None)

Additional arguments for resampling_strategy * ‘holdout’: None * ‘holdout-iterative-fit’: None * ‘cv’: {‘folds’: int}

tmp_folder : string, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

output_folder : string, optional (None)

folder to store predictions for optional test set, if None automatically use /tmp/autosklearn_output_$pid_$random_number

delete_tmp_folder_after_terminate: string, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

delete_output_folder_after_terminate: bool, optional (True)

remove output_folder, when finished. If output_folder is None output_dir will always be deleted

shared_mode: bool, optional (False)

Run smac in shared-model-node. This only works if arguments tmp_folder and output_folder are given and both delete_tmp_folder_after_terminate and delete_output_folder_after_terminate are set to False.

disable_evaluator_output: bool or list, optional (False)

If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the list are:

  • 'y_optimization' : do not save the predictions for the optimization/validation set, which would later on be used to build an ensemble.
  • 'model' : do not save any model files

configuration_mode : SMAC or ROAR

Defines the configuration mode as described in the paper Sequential Model-Based Optimization for General Algorithm Configuration:

  • SMAC (default): Sequential Model-based Algorithm Configuration, which is a Bayesian optimization algorithm
  • ROAR: Random Online Aggressive Racing, which is basically random search

Attributes

cv_results_ (dict of numpy (masked) ndarrays) A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame. Not all keys returned by scikit-learn are supported yet.
fit(X, y, metric=None, feat_type=None, dataset_name=None)[source]

Fit auto-sklearn to given training set (X, y).

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The target classes.

metric : callable, optional (default=’autosklearn.metrics.accuracy’)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics.

feat_type : list, optional (default=None)

List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.

dataset_name : str, optional (default=None)

Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns:

self

fit_ensemble(y, task=None, metric=None, precision='32', dataset_name=None, ensemble_nbest=None, ensemble_size=None)

Fit an ensemble to models trained during an optimization process.

All parameters are None by default. If no other value is given, the default values which were set in a call to fit() are used.

Parameters:

y : array-like

Target values.

task : int

A constant from the module autosklearn.constants. Determines the task type (binary classification, multiclass classification, multilabel classification or regression).

metric : callable, optional

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics.

precision : str

Numeric precision used when loading ensemble data. Can be either '16', '32' or '64'.

dataset_name : str

Name of the current data set.

ensemble_nbest : int

Determines how many models should be considered from the ensemble building. This is inspired by a concept called library pruning introduced in Getting Most out of Ensemble Selection.

ensemble_size : int

Size of the ensemble built by Ensomble Selection.

Returns:

self

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X, batch_size=None, n_jobs=1)[source]

Predict classes for X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples] or [n_samples, n_labels]

The predicted classes.

predict_proba(X, batch_size=None, n_jobs=1)[source]

Predict probabilities of classes for all samples X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples, n_classes] or [n_samples, n_labels]

The predicted class probabilities.

refit(X, y)

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, auto-sklearn fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The targets.

Returns:

self

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
show_models()

Return a representation of the final ensemble found by auto-sklearn.

Returns:str

Regression

class autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=3600, per_run_time_limit=360, initial_configurations_via_metalearning=25, ensemble_size=50, ensemble_nbest=50, seed=1, ml_memory_limit=3072, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, output_folder=None, delete_tmp_folder_after_terminate=True, delete_output_folder_after_terminate=True, shared_mode=False, disable_evaluator_output=False, configuration_mode='SMAC')[source]

This class implements the regression task.

Parameters:

time_left_for_this_task : int, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limit : int, optional (default=360)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearning : int, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_size : int, optional (default=50)

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement.

ensemble_nbest : int, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. Implements Model Library Pruning from Getting the most out of ensemble selection.

seed : int, optional (default=1)

ml_memory_limit : int, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than ml_memory_limit MB.

include_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators to use.

exclude_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

include_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

exclude_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

resampling_strategy : string, optional (‘holdout’)

how to to handle overfitting, might need ‘resampling_strategy_arguments’

  • ‘holdout’: 66:33 (train:test) split
  • ‘holdout-iterative-fit’: 66:33 (train:test) split, calls iterative fit where possible
  • ‘cv’: crossvalidation, requires ‘folds’

resampling_strategy_arguments : dict, optional if ‘holdout’ (None)

Additional arguments for resampling_strategy * ‘holdout’: None * ‘holdout-iterative-fit’: None * ‘cv’: {‘folds’: int}

tmp_folder : string, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

output_folder : string, optional (None)

folder to store predictions for optional test set, if None automatically use /tmp/autosklearn_output_$pid_$random_number

delete_tmp_folder_after_terminate: string, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

delete_output_folder_after_terminate: bool, optional (True)

remove output_folder, when finished. If output_folder is None output_dir will always be deleted

shared_mode: bool, optional (False)

Run smac in shared-model-node. This only works if arguments tmp_folder and output_folder are given and both delete_tmp_folder_after_terminate and delete_output_folder_after_terminate are set to False.

disable_evaluator_output: bool or list, optional (False)

If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the list are:

  • 'y_optimization' : do not save the predictions for the optimization/validation set, which would later on be used to build an ensemble.
  • 'model' : do not save any model files

configuration_mode : SMAC or ROAR

Defines the configuration mode as described in the paper Sequential Model-Based Optimization for General Algorithm Configuration:

  • SMAC (default): Sequential Model-based Algorithm Configuration, which is a Bayesian optimization algorithm
  • ROAR: Random Online Aggressive Racing, which is basically random search

Attributes

cv_results_ (dict of numpy (masked) ndarrays) A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame. Not all keys returned by scikit-learn are supported yet.
fit(X, y, metric=None, feat_type=None, dataset_name=None)[source]

Fit autosklearn to given training set (X, y).

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The regression target.

metric : callable, optional (default=’autosklearn.metrics.r2’)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics.

feat_type : list, optional (default=None)

List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded.

dataset_name : str, optional (default=None)

Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns:

self

fit_ensemble(y, task=None, metric=None, precision='32', dataset_name=None, ensemble_nbest=None, ensemble_size=None)

Fit an ensemble to models trained during an optimization process.

All parameters are None by default. If no other value is given, the default values which were set in a call to fit() are used.

Parameters:

y : array-like

Target values.

task : int

A constant from the module autosklearn.constants. Determines the task type (binary classification, multiclass classification, multilabel classification or regression).

metric : callable, optional

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics.

precision : str

Numeric precision used when loading ensemble data. Can be either '16', '32' or '64'.

dataset_name : str

Name of the current data set.

ensemble_nbest : int

Determines how many models should be considered from the ensemble building. This is inspired by a concept called library pruning introduced in Getting Most out of Ensemble Selection.

ensemble_size : int

Size of the ensemble built by Ensomble Selection.

Returns:

self

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X, batch_size=None, n_jobs=1)[source]

Predict regression target for X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples] or [n_samples, n_outputs]

The predicted values.

refit(X, y)

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, auto-sklearn fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The targets.

Returns:

self

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
show_models()

Return a representation of the final ensemble found by auto-sklearn.

Returns:str

Metrics

autosklearn.metrics.make_scorer(name, score_func, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs)[source]

Make a scorer from a performance metric or loss function.

Factory inspired by scikit-learn which wraps scikit-learn scoring functions to be used in auto-sklearn.

Parameters:

score_func : callable

Score function (or loss function) with signature score_func(y, y_pred, **kwargs).

greater_is_better : boolean, default=True

Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.

needs_proba : boolean, default=False

Whether score_func requires predict_proba to get probability estimates out of a classifier.

needs_threshold : boolean, default=False

Whether score_func takes a continuous decision certainty. This only works for binary classification.

**kwargs : additional arguments

Additional parameters to be passed to score_func.

Returns:

scorer : callable

Callable object that returns a scalar score; greater is better.

Built-in Metrics

Classification

autosklearn.metrics.accuracy
autosklearn.metrics.balanced_accuracy
autosklearn.metrics.f1
autosklearn.metrics.f1_macro
autosklearn.metrics.f1_micro
autosklearn.metrics.f1_samples
autosklearn.metrics.f1_weighted
autosklearn.metrics.roc_auc
autosklearn.metrics.precision
autosklearn.metrics.precision_macro
autosklearn.metrics.precision_micro
autosklearn.metrics.precision_samples
autosklearn.metrics.precision_weighted
autosklearn.metrics.average_precision
autosklearn.metrics.recall
autosklearn.metrics.recall_macro
autosklearn.metrics.recall_micro
autosklearn.metrics.recall_samples
autosklearn.metrics.recall_weighted
autosklearn.metrics.log_loss
autosklearn.metrics.pac_score

Regression

autosklearn.metrics.r2
autosklearn.metrics.mean_squared_error
autosklearn.metrics.mean_absolute_error
autosklearn.metrics.median_absolute_error

Extension Interfaces

class autosklearn.pipeline.components.base.AutoSklearnClassificationAlgorithm[source]

Provide an abstract interface for classification algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_estimator()[source]

Return the underlying estimator object.

Returns:estimator : the underlying estimator object
predict(X)[source]

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

array, shape = (n_samples,) or shape = (n_samples, n_labels)

Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

predict_proba(X)[source]

Predict probabilities.

Parameters:X : array-like, shape = (n_samples, n_features)
Returns:array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
class autosklearn.pipeline.components.base.AutoSklearnRegressionAlgorithm[source]

Provide an abstract interface for regression algorithms in auto-sklearn.

Make a subclass of this and put it into the directory autosklearn/pipeline/components/regression to make it available.

get_estimator()[source]

Return the underlying estimator object.

Returns:estimator : the underlying estimator object
predict(X)[source]

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

array, shape = (n_samples,)

Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

class autosklearn.pipeline.components.base.AutoSklearnPreprocessingAlgorithm[source]

Provide an abstract interface for preprocessing algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_preprocessor()[source]

Return the underlying preprocessor object.

Returns:preprocessor : the underlying preprocessor object
transform(X)[source]

The transform function calls the transform function of the underlying scikit-learn model and returns the transformed array.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

X : array

Return the transformed training data

Notes

Please see the scikit-learn API documentation for further information.