APIs

Main modules

Classification

class autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=3600, per_run_time_limit=360, initial_configurations_via_metalearning=25, ensemble_size=50, ensemble_nbest=50, seed=1, ml_memory_limit=3072, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, output_folder=None, delete_tmp_folder_after_terminate=True, delete_output_folder_after_terminate=True, shared_mode=False, disable_evaluator_output=False, configuration_mode='SMAC')[source]

This class implements the classification task.

Parameters:

time_left_for_this_task : int, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limit : int, optional (default=360)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearning : int, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_size : int, optional (default=50)

Number of models added to the ensemble built by `Ensemble selection from libraries of models. Models are drawn with replacement.

ensemble_nbest : int, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. Implements Model Library Pruning from Getting the most out of ensemble selection.

seed : int, optional (default=1)

ml_memory_limit : int, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than ml_memory_limit MB.

include_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators to use.

exclude_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

include_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

exclude_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

resampling_strategy : string, optional (‘holdout’)

how to to handle overfitting, might need ‘resampling_strategy_arguments’

  • ‘holdout’: 66:33 (train:test) split
  • ‘holdout-iterative-fit’: 66:33 (train:test) split, calls iterative fit where possible
  • ‘cv’: crossvalidation, requires ‘folds’

resampling_strategy_arguments : dict, optional if ‘holdout’ (None)

Additional arguments for resampling_strategy * ‘holdout’: None * ‘holdout-iterative-fit’: None * ‘cv’: {‘folds’: int}

tmp_folder : string, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

output_folder : string, optional (None)

folder to store predictions for optional test set, if None automatically use /tmp/autosklearn_output_$pid_$random_number

delete_tmp_folder_after_terminate: string, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

delete_output_folder_after_terminate: bool, optional (True)

remove output_folder, when finished. If output_folder is None output_dir will always be deleted

shared_mode: bool, optional (False)

Run smac in shared-model-node. This only works if arguments tmp_folder and output_folder are given and both delete_tmp_folder_after_terminate and delete_output_folder_after_terminate are set to False.

disable_evaluator_output: bool, optional (False)

Disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this flag to True.

Attributes

grid_scores_ (list of named tuples) Contains scores for all parameter combinations in param_grid. Each entry corresponds to one parameter setting. Each named tuple has the attributes: * parameters, a dict of parameter settings * mean_validation_score, the mean score over the cross-validation folds * cv_validation_scores, the list of scores for each fold
cv_results_ (dict of numpy (masked) ndarrays) A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame. This attribute is a backward port to already support the advanced output of scikit-learn 0.18. Not all keys returned by scikit-learn are supported yet.
fit(X, y, metric='acc_metric', feat_type=None, dataset_name=None)[source]

Fit auto-sklearn to given training set (X, y).

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The target classes.

metric : str, optional (default=’acc_metric’)

The metric to optimize for. Can be one of: [‘acc_metric’, ‘auc_metric’, ‘bac_metric’, ‘f1_metric’, ‘pac_metric’]. A description of the metrics can be found in `the paper describing the AutoML Challenge.

feat_type : list, optional (default=None)

List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded.

dataset_name : str, optional (default=None)

Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns:

self

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X)[source]

Predict classes for X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples] or [n_samples, n_labels]

The predicted classes.

predict_proba(X)[source]

Predict probabilities of classes for all samples X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples, n_classes] or [n_samples, n_labels]

The predicted class probabilities.

refit(X, y)

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, auto-sklearn fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The targets.

Returns:

self

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
show_models()

Return a representation of the final ensemble found by auto-sklearn

Returns:str

Regression

class autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=3600, per_run_time_limit=360, initial_configurations_via_metalearning=25, ensemble_size=50, ensemble_nbest=50, seed=1, ml_memory_limit=3072, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, output_folder=None, delete_tmp_folder_after_terminate=True, delete_output_folder_after_terminate=True, shared_mode=False, disable_evaluator_output=False, configuration_mode='SMAC')[source]

This class implements the regression task.

Parameters:

time_left_for_this_task : int, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limit : int, optional (default=360)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearning : int, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_size : int, optional (default=50)

Number of models added to the ensemble built by `Ensemble selection from libraries of models. Models are drawn with replacement.

ensemble_nbest : int, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. Implements Model Library Pruning from Getting the most out of ensemble selection.

seed : int, optional (default=1)

ml_memory_limit : int, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than ml_memory_limit MB.

include_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators to use.

exclude_estimators : list, optional (None)

If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

include_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

exclude_preprocessors : list, optional (None)

If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

resampling_strategy : string, optional (‘holdout’)

how to to handle overfitting, might need ‘resampling_strategy_arguments’

  • ‘holdout’: 66:33 (train:test) split
  • ‘holdout-iterative-fit’: 66:33 (train:test) split, calls iterative fit where possible
  • ‘cv’: crossvalidation, requires ‘folds’

resampling_strategy_arguments : dict, optional if ‘holdout’ (None)

Additional arguments for resampling_strategy * ‘holdout’: None * ‘holdout-iterative-fit’: None * ‘cv’: {‘folds’: int}

tmp_folder : string, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

output_folder : string, optional (None)

folder to store predictions for optional test set, if None automatically use /tmp/autosklearn_output_$pid_$random_number

delete_tmp_folder_after_terminate: string, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

delete_output_folder_after_terminate: bool, optional (True)

remove output_folder, when finished. If output_folder is None output_dir will always be deleted

shared_mode: bool, optional (False)

Run smac in shared-model-node. This only works if arguments tmp_folder and output_folder are given and both delete_tmp_folder_after_terminate and delete_output_folder_after_terminate are set to False.

disable_evaluator_output: bool, optional (False)

Disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this flag to True.

Attributes

grid_scores_ (list of named tuples) Contains scores for all parameter combinations in param_grid. Each entry corresponds to one parameter setting. Each named tuple has the attributes: * parameters, a dict of parameter settings * mean_validation_score, the mean score over the cross-validation folds * cv_validation_scores, the list of scores for each fold
cv_results_ (dict of numpy (masked) ndarrays) A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame. This attribute is a backward port to already support the advanced output of scikit-learn 0.18. Not all keys returned by scikit-learn are supported yet.
fit(X, y, metric='r2_metric', feat_type=None, dataset_name=None)[source]

Fit autosklearn to given training set (X, y).

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The regression target.

metric : str, optional (default=’r2_metric’)

The metric to optimize for. Can be one of: [‘r2_metric’, ‘a_metric’]. A description of the metrics can be found in the paper describing the AutoML Challenge.

feat_type : list, optional (default=None)

List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded.

dataset_name : str, optional (default=None)

Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns:

self

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X)[source]

Predict regression target for X.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

Returns:

y : array of shape = [n_samples] or [n_samples, n_outputs]

The predicted values.

refit(X, y)

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, auto-sklearn fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The training input samples.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The targets.

Returns:

self

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self
show_models()

Return a representation of the final ensemble found by auto-sklearn

Returns:str

Extension Interfaces

class autosklearn.pipeline.components.base.AutoSklearnClassificationAlgorithm[source]

Provide an abstract interface for classification algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_estimator()[source]

Return the underlying estimator object.

Returns:estimator : the underlying estimator object
predict(X)[source]

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

array, shape = (n_samples,) or shape = (n_samples, n_labels)

Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

predict_proba(X)[source]

Predict probabilities.

Parameters:X : array-like, shape = (n_samples, n_features)
Returns:array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
class autosklearn.pipeline.components.base.AutoSklearnRegressionAlgorithm[source]

Provide an abstract interface for regression algorithms in auto-sklearn.

Make a subclass of this and put it into the directory autosklearn/pipeline/components/regression to make it available.

get_estimator()[source]

Return the underlying estimator object.

Returns:estimator : the underlying estimator object
predict(X)[source]

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

array, shape = (n_samples,)

Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

class autosklearn.pipeline.components.base.AutoSklearnPreprocessingAlgorithm[source]

Provide an abstract interface for preprocessing algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_preprocessor()[source]

Return the underlying preprocessor object.

Returns:preprocessor : the underlying preprocessor object
transform(X)[source]

The transform function calls the transform function of the underlying scikit-learn model and returns the transformed array.

Parameters:

X : array-like, shape = (n_samples, n_features)

Returns:

X : array

Return the transformed training data

Notes

Please see the scikit-learn API documentation for further information.