APIs¶

Main modules¶

Classification¶

class autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=3600, per_run_time_limit=None, initial_configurations_via_metalearning=25, ensemble_size: int | None = None, ensemble_class: Type[AbstractEnsemble] | Literal['default'] | None = 'default', ensemble_kwargs: Dict[str, Any] | None = None, ensemble_nbest=50, max_models_on_disc=50, seed=1, memory_limit=3072, include: Optional[Dict[str, List[str]]] = None, exclude: Optional[Dict[str, List[str]]] = None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, delete_tmp_folder_after_terminate=True, n_jobs: Optional[int] = None, dask_client: Optional[dask.distributed.Client] = None, disable_evaluator_output=False, get_smac_object_callback=None, smac_scenario_args=None, logging_config=None, metadata_directory=None, metric: Scorer | Sequence[Scorer] | None = None, scoring_functions: Optional[List[Scorer]] = None, load_models: bool = True, get_trials_callback: SMACCallback | None = None, dataset_compression: Union[bool, Mapping[str, Any]] = True, allow_string_features: bool = True, disable_progress_bar: bool = False)[source]¶

This class implements the classification task.

Parameters

time_left_for_this_taskint, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limitint, optional (default=1/10 of time_left_for_this_task)

Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

initial_configurations_via_metalearningint, optional (default=25)

Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint, optional (default=50)

Only consider the ensemble_nbest models when building an ensemble. This is inspired by a concept called library pruning introduced in Getting Most out of Ensemble Selection. This is independent of the ensemble_class argument and this pruning step is done prior to constructing an ensemble.

max_models_on_disc: int, optional (default=50),

Defines the maximum number of models that are kept in the disc. The additional number of models are permanently deleted. Due to the nature of this variable, it sets the upper limit on how many models can be used for an ensemble. It must be an integer greater or equal than 1. If set to None, all models are kept on the disc.

seedint, optional (default=1)

Used to seed SMAC. Will determine the output file names.

memory_limitint, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB.

Important notes:

If None is provided, no memory limit is set.
In case of multi-processing, memory_limit will be per job, so the total usage is n_jobs x memory_limit.
The memory limit also applies to the ensemble creation process.

includeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are included in search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter exclude.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

include = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

excludeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are excluded from search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter include.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

exclude = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

resampling_strategystr | BaseCrossValidator | _RepeatedSplits | BaseShuffleSplit = “holdout”

How to to handle overfitting, might need to use resampling_strategy_arguments if using "cv" based method or a Splitter object.

Options
- "holdout" - Use a 67:33 (train:test) split
- "cv": perform cross validation, requires “folds” in resampling_strategy_arguments
- "holdout-iterative-fit" - Same as “holdout” but iterative fit where possible
- "cv-iterative-fit": Same as “cv” but iterative fit where possible
- "partial-cv": Same as “cv” but uses intensification.
- BaseCrossValidator - any BaseCrossValidator subclass (found in scikit-learn model_selection module)
- _RepeatedSplits - any _RepeatedSplits subclass (found in scikit-learn model_selection module)
- BaseShuffleSplit - any BaseShuffleSplit subclass (found in scikit-learn model_selection module)

If using a Splitter object that relies on the dataset retaining it’s current size and order, you will need to look at the dataset_compression argument and ensure that "subsample" is not included in the applied compression "methods" or disable it entirely with False.

resampling_strategy_argumentsOptional[Dict] = None

Additional arguments for resampling_strategy, this is required if using a cv based strategy. The default arguments if left as None are:

{
    "train_size": 0.67,     # The size of the training set
    "shuffle": True,        # Whether to shuffle before splitting data
    "folds": 5              # Used in 'cv' based resampling strategies
}

If using a custom splitter class, which takes n_splits such as PredefinedSplit, the value of "folds" will be used.

tmp_folderstring, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

delete_tmp_folder_after_terminate: bool, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

n_jobsint, optional, experimental

The number of jobs to run in parallel for fit(). -1 means using all processors.

Important notes:

By default, Auto-sklearn uses one core.
Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble.
predict() is not affected by n_jobs (in contrast to most scikit-learn models)
If dask_client is None, a new dask client is created.

dask_clientdask.distributed.Client, optional

User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

disable_evaluator_output: bool or list, optional (False)

If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the list are:

'y_optimization' : do not save the predictions for the optimization set, which would later on be used to build an ensemble.
model : do not save any model files

smac_scenario_argsdict, optional (None)

Additional arguments inserted into the scenario of SMAC. See the SMAC documentation for a list of available arguments.

get_smac_object_callbackcallable

Callback function to create an object of class smac.facade.AbstractFacade. The function must accept the arguments scenario_dict, instances, num_params, runhistory, seed and ta. This is an advanced feature. Use only if you are familiar with SMAC.

logging_configdict, optional (None)

dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

metadata_directorystr, optional (None)

path to the metadata directory. If None, the default directory (autosklearn.metalearning.files) is used.

metricScorer, optional (None)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics. If None is provided, a default metric is selected depending on the task.

scoring_functionsList[Scorer], optional (None)

List of scorers which will be calculated for each pipeline and results will be available via cv_results

load_modelsbool, optional (True)

Whether to load the models after fitting Auto-sklearn.

get_trials_callback: callable

A callable with the following definition.

(smac.SMBO, smac.RunInfo, smac.RunValue, time_left: float) -> bool | None

This will be called after SMAC, the underlying optimizer for autosklearn, finishes training each run.

You can use this to record your own information about the optimization process. You can also use this to enable a early stopping based on some critera.

See the example: Early Stopping And Callbacks.

dataset_compression: Union[bool, Mapping[str, Any]] = True

We compress datasets so that they fit into some predefined amount of memory. Currently this does not apply to dataframes or sparse arrays, only to raw numpy arrays.

NOTE - If using a custom resampling_strategy that relies on specific size or ordering of data, this must be disabled to preserve these properties.

You can disable this entirely by passing False or leave as the default True for configuration below.

{
    "memory_allocation": 0.1,
    "methods": ["precision", "subsample"]
}

You can also pass your own configuration with the same keys and choosing from the available "methods".

The available options are described here:

memory_allocation
By default, we attempt to fit the dataset into 0.1 * memory_limit. This float value can be set with "memory_allocation": 0.1. We also allow for specifying absolute memory in MB, e.g. 10MB is "memory_allocation": 10.

The memory used by the dataset is checked after each reduction method is performed. If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed.

For example, if methods: ["precision", "subsample"] and the "precision" reduction step was enough to make the dataset fit into memory, then the "subsample" reduction step will not be performed.
methods
We provide the following methods for reducing the dataset size. These can be provided in a list and are performed in the order as given.
- "precision" - We reduce floating point precision as follows: * np.float128 -> np.float64 * np.float96 -> np.float64 * np.float64 -> np.float32
- subsample - We subsample data such that it fits directly into the memory allocation memory_allocation * memory_limit. Therefore, this should likely be the last method listed in "methods". Subsampling takes into account classification labels and stratifies accordingly. We guarantee that at least one occurrence of each label is included in the sampled set.

allow_string_features: bool = True

Whether autosklearn should process string features. By default the textpreprocessing is enabled.

disable_progress_bar: bool = False

Whether to disable the progress bar that is displayed in the console while fitting to the training data.

Attributes

cv_results_dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

Not all keys returned by scikit-learn are supported yet.

performance_over_time_pandas.core.frame.DataFrame

A DataFrame containing the models performance over time data. Can be used for plotting directly. Please refer to the example Train and Test Inputs.

fit(X, y, X_test=None, y_test=None, feat_type=None, dataset_name=None)[source]¶

Fit auto-sklearn to given training set (X, y).

Fit both optimizes the machine learning models and builds an ensemble out of them.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: The target classes.
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Test data input samples. Will be used to save test predictions for all models. This allows to evaluate the performance of Auto-sklearn over time.
y_testarray-like, shape = [n_samples] or [n_samples, n_outputs]: Test data target classes. Will be used to calculate the test error of all models. This allows to evaluate the performance of Auto-sklearn over time.
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.
dataset_namestr, optional (default=None): Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns

self

fit_ensemble(y, task: int = None, precision: Literal[16, 21, 64] = 32, dataset_name: Optional[str] = None, ensemble_size: int | None = None, ensemble_kwargs: Optional[Dict[str, Any]] = None, ensemble_nbest: Optional[int] = None, ensemble_class: Type[AbstractEnsemble] | Literal['default'] | None = 'default', metric: Scorer | Sequence[Scorer] | None = None)¶

Fit an ensemble to models trained during an optimization process.

All parameters are None by default. If no other value is given, the default values which were set in a call to fit() are used.

Parameters

yarray-like

Target values.

taskint

A constant from the module autosklearn.constants. Determines the task type (binary classification, multiclass classification, multilabel classification or regression).

precisionint

Numeric precision used when loading ensemble data. Can be either 16, 32 or 64.

dataset_namestr

Name of the current data set.

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use class:SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

metric: Scorer | Sequence[Scorer] | None = None

A metric or list of metrics to score the ensemble with

Returns

self

fit_pipeline(X: Union[List, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix], y: Union[List, pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix], config: Union[ConfigSpace.configuration_space.Configuration, Dict[str, Union[str, float, int]]], dataset_name: Optional[str] = None, X_test: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix]] = None, y_test: Optional[Union[List, pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix]] = None, feat_type: Optional[List[str]] = None, *args, **kwargs: Dict) → Tuple[Optional[autosklearn.pipeline.base.BasePipeline], smac.runhistory.runhistory.RunInfo, smac.runhistory.runhistory.RunValue]¶

Fits and individual pipeline configuration and returns the result to the user.

The Estimator constraints are honored, for example the resampling strategy, or memory constraints, unless directly provided to the method. By default, this method supports the same signature as fit(), and any extra arguments are redirected to the TAE evaluation function, which allows for further customization while building a pipeline.

Any additional argument provided is directly passed to the worker exercising the run.

Parameters

X: array-like, shape = (n_samples, n_features): The features used for training
y: array-like: The labels used for training
X_test: Optionalarray-like, shape = (n_samples, n_features): If provided, the testing performance will be tracked on this features.
y_test: array-like: If provided, the testing performance will be tracked on this labels
config: Union[Configuration, Dict[str, Union[str, float, int]]]: A configuration object used to define the pipeline steps. If a dict is passed, a configuration is created based on this dict.
dataset_name: Optional[str]: Name that will be used to tag the Auto-Sklearn run and identify the Auto-Sklearn run
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.

Returns

pipeline: Optional[BasePipeline]: The fitted pipeline. In case of failure while fitting the pipeline, a None is returned.
run_info: RunInFo: A named tuple that contains the configuration launched
run_value: RunValue: A named tuple that contains the result of the run

get_configuration_space(X: Union[List, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix], y: Union[List, pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix], X_test: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix]] = None, y_test: Optional[Union[List, pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray, scipy.sparse._base.spmatrix]] = None, dataset_name: Optional[str] = None, feat_type: Optional[List[str]] = None) → ConfigSpace.configuration_space.ConfigurationSpace¶

Returns the Configuration Space object, from which Auto-Sklearn will sample configurations and build pipelines.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: Array with the training features, used to get characteristics like data sparsity
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Array with features used for performance estimation
y_testarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels for the testing split
dataset_name: Optional[str]: A string to tag the Auto-Sklearn run

get_models_with_weights()¶

Return a list of the final ensemble found by auto-sklearn.

Returns

[(weight_1, model_1), …, (weight_n, model_n)]

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

leaderboard(detailed: bool = False, ensemble_only: bool = True, top_k: Union[int, Literal['all']] = 'all', sort_by: str = 'cost', sort_order: Literal['auto', 'ascending', 'descending'] = 'auto', include: Optional[Union[str, Iterable[str]]] = None) → pandas.core.frame.DataFrame¶

Returns a pandas table of results for all evaluated models.

Gives an overview of all models trained during the search process along with various statistics about their training.

The available statistics are:

Simple:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"ensemble_weight" - The weight given to the model in the ensemble.
"type" - The type of classifier/regressor used.
"cost" - The loss of the model on the validation set.
"duration" - Length of time the model was optimized for.

Detailed: The detailed view includes all of the simple statistics along with the following.

"config_id" - The id used by SMAC for optimization.
"budget" - How much budget was allocated to this model.
"status" - The return status of training the model with SMAC.
"train_loss" - The loss of the model on the training set.
"balancing_strategy" - The balancing strategy used for data preprocessing.
"start_time" - Time the model began being optimized
"end_time" - Time the model ended being optimized
"data_preprocessors" - The preprocessors used on the data
"feature_preprocessors" - The preprocessors for features types

Parameters

detailed: bool = False

Whether to give detailed information or just a simple overview.

ensemble_only: bool = True

Whether to view only models included in the ensemble or all models trained.

top_k: int or “all” = “all”

How many models to display.

sort_by: str = ‘cost’

What column to sort by. If that column is not present, the sorting defaults to the "model_id" index column.

Defaults to the metric optimized. Sort by the first objective in case of a multi-objective optimization problem

sort_order: “auto” or “ascending” or “descending” = “auto”

Which sort order to apply to the sort_by column. If left as "auto", it will sort by a sensible default where “better” is on top, otherwise defaulting to the pandas default for DataFrame.sort_values if there is no obvious “better”.

include: Optional[str or Iterable[str]]

Items to include, other items not specified will be excluded. The exception is the "model_id" index column which is always included.

If left as None, it will resort back to using the detailed param to decide the columns to include.

Returns

pd.DataFrame: A dataframe of statistics for the models, ordered by sort_by.

predict(X, batch_size=None, n_jobs=1)[source]¶

Predict classes for X.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]

Returns

yarray of shape = [n_samples] or [n_samples, n_labels]: The predicted classes.

predict_proba(X, batch_size=None, n_jobs=1)[source]¶

Predict probabilities of classes for all samples X.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]
batch_sizeint (optional): Number of data points to predict for (predicts all points at once if None.
n_jobsint

Returns

yarray of shape = [n_samples, n_classes] or [n_samples, n_labels]: The predicted class probabilities.

refit(X, y)¶

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, auto-sklearn fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: The targets.

Returns

self

score(X, y)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

show_models()¶

Returns a dictionary containing dictionaries of ensemble models.

Each model in the ensemble can be accessed by giving its model_id as key.

A model dictionary contains the following:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"cost" - The loss of the model on the validation set.
"ensemble_weight" - The weight given to the model in the ensemble.
"voting_model" - The cv_voting_ensemble model (for ‘cv’ resampling).
"estimators" - List of models (dicts) in cv_voting_ensemble
(‘cv’ resampling).
"data_preprocessor" - The preprocessor used on the data.
"balancing" - The balancing used on the data (for classification).
"feature_preprocessor" - The preprocessor for features types.
"classifier" / "regressor" - The autosklearn wrapped classifier or regressor.
"sklearn_classifier" or "sklearn_regressor" - The sklearn classifier or regressor.

Example

import sklearn.datasets
import sklearn.metrics
import autosklearn.regression

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120
    )
automl.fit(X_train, y_train, dataset_name='diabetes')

ensemble_dict = automl.show_models()
print(ensemble_dict)

Output:

{
    25: {'model_id': 25.0,
         'rank': 1,
         'cost': 0.43667876507897496,
         'ensemble_weight': 0.38,
         'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
         'feature_preprocessor': <autosklearn.pipeline.components....>,
         'regressor': <autosklearn.pipeline.components.regression....>,
         'sklearn_regressor': SGDRegressor(alpha=0.0006517033225329654,...)
        },
    6: {'model_id': 6.0,
        'rank': 2,
        'cost': 0.4550418898836528,
        'ensemble_weight': 0.3,
        'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
        'feature_preprocessor': <autosklearn.pipeline.components....>,
        'regressor': <autosklearn.pipeline.components.regression....>,
        'sklearn_regressor': ARDRegression(alpha_1=0.0003701926442639788,...)
        }...
}

Returns

Dict(int, Any)dictionary of length = number of models in the ensemble: A dictionary of models in the ensemble, where model_id is the key.

sprint_statistics()¶

Return the following statistics of the training result:

dataset name
metric used
best validation score
number of target algorithm runs
number of successful target algorithm runs
number of crashed target algorithm runs
number of target algorithm runs that exceeded the memory limit
number of target algorithm runs that exceeded the time limit

Returns

str

class autosklearn.experimental.askl2.AutoSklearn2Classifier(time_left_for_this_task: int = 3600, per_run_time_limit=None, ensemble_size: int | None = None, ensemble_class: AbstractEnsemble | None = <class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>, ensemble_kwargs: Dict[str, Any] | None = None, ensemble_nbest: Union[float, int] = 50, max_models_on_disc: int = 50, seed: int = 1, memory_limit: int = 3072, tmp_folder: Optional[str] = None, delete_tmp_folder_after_terminate: bool = True, n_jobs: Optional[int] = None, dask_client: Optional[dask.distributed.Client] = None, disable_evaluator_output: bool = False, smac_scenario_args: Optional[Dict[str, Any]] = None, logging_config: Optional[Dict[str, Any]] = None, metric: Optional[Scorer] = None, scoring_functions: Optional[List[Scorer]] = None, load_models: bool = True, dataset_compression: Union[bool, Mapping[str, Any]] = True, allow_string_features: bool = True, disable_progress_bar: bool = False)[source]¶

Parameters

time_left_for_this_taskint, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limitint, optional (default=1/10 of time_left_for_this_task)

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_classType[AbstractEnsemble], optional (default=EnsembleSelection)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use SingleBest to obtain only use the single best model instead of an ensemble.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

max_models_on_disc: int, optional (default=50),

seedint, optional (default=1)

Used to seed SMAC. Will determine the output file names.

memory_limitint, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB.

Important notes:

If None is provided, no memory limit is set.
In case of multi-processing, memory_limit will be per job, so the total usage is n_jobs x memory_limit.
The memory limit also applies to the ensemble creation process.

tmp_folderstring, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

delete_tmp_folder_after_terminate: string, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

n_jobsint, optional, experimental

The number of jobs to run in parallel for fit(). -1 means using all processors.

Important notes:

By default, Auto-sklearn uses one core.
Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble.
predict() is not affected by n_jobs (in contrast to most scikit-learn models)
If dask_client is None, a new dask client is created.

dask_clientdask.distributed.Client, optional

User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

disable_evaluator_output: bool or list, optional (False)

'y_optimization' : do not save the predictions for the optimization/validation set, which would later on be used to build an ensemble.
model : do not save any model files

smac_scenario_argsdict, optional (None)

Additional arguments inserted into the scenario of SMAC. See the SMAC documentation for a list of available arguments.

logging_configdict, optional (None)

dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

metricScorer, optional (None)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics. If None is provided, a default metric is selected depending on the task.

scoring_functionsList[Scorer], optional (None)

List of scorers which will be calculated for each pipeline and results will be available via cv_results

load_modelsbool, optional (True)

Whether to load the models after fitting Auto-sklearn.

disable_progress_bar: bool = False

Whether to disable the progress bar that is displayed in the console while fitting to the training data.

Attributes

cv_results_dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

Not all keys returned by scikit-learn are supported yet.

fit(X, y, X_test=None, y_test=None, metric=None, feat_type=None, dataset_name=None)[source]¶

Fit auto-sklearn to given training set (X, y).

Fit both optimizes the machine learning models and builds an ensemble out of them.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: The target classes.
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Test data input samples. Will be used to save test predictions for all models. This allows to evaluate the performance of Auto-sklearn over time.
y_testarray-like, shape = [n_samples] or [n_samples, n_outputs]: Test data target classes. Will be used to calculate the test error of all models. This allows to evaluate the performance of Auto-sklearn over time.
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.
dataset_namestr, optional (default=None): Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns

self

Fit an ensemble to models trained during an optimization process.

All parameters are None by default. If no other value is given, the default values which were set in a call to fit() are used.

Parameters

yarray-like

Target values.

taskint

A constant from the module autosklearn.constants. Determines the task type (binary classification, multiclass classification, multilabel classification or regression).

precisionint

Numeric precision used when loading ensemble data. Can be either 16, 32 or 64.

dataset_namestr

Name of the current data set.

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use class:SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

metric: Scorer | Sequence[Scorer] | None = None

A metric or list of metrics to score the ensemble with

Returns

self

Fits and individual pipeline configuration and returns the result to the user.

Any additional argument provided is directly passed to the worker exercising the run.

Parameters

X: array-like, shape = (n_samples, n_features): The features used for training
y: array-like: The labels used for training
X_test: Optionalarray-like, shape = (n_samples, n_features): If provided, the testing performance will be tracked on this features.
y_test: array-like: If provided, the testing performance will be tracked on this labels
config: Union[Configuration, Dict[str, Union[str, float, int]]]: A configuration object used to define the pipeline steps. If a dict is passed, a configuration is created based on this dict.
dataset_name: Optional[str]: Name that will be used to tag the Auto-Sklearn run and identify the Auto-Sklearn run
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.

Returns

pipeline: Optional[BasePipeline]: The fitted pipeline. In case of failure while fitting the pipeline, a None is returned.
run_info: RunInFo: A named tuple that contains the configuration launched
run_value: RunValue: A named tuple that contains the result of the run

Returns the Configuration Space object, from which Auto-Sklearn will sample configurations and build pipelines.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: Array with the training features, used to get characteristics like data sparsity
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Array with features used for performance estimation
y_testarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels for the testing split
dataset_name: Optional[str]: A string to tag the Auto-Sklearn run

get_models_with_weights()¶

Return a list of the final ensemble found by auto-sklearn.

Returns

[(weight_1, model_1), …, (weight_n, model_n)]

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

Returns a pandas table of results for all evaluated models.

Gives an overview of all models trained during the search process along with various statistics about their training.

The available statistics are:

Simple:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"ensemble_weight" - The weight given to the model in the ensemble.
"type" - The type of classifier/regressor used.
"cost" - The loss of the model on the validation set.
"duration" - Length of time the model was optimized for.

Detailed: The detailed view includes all of the simple statistics along with the following.

"config_id" - The id used by SMAC for optimization.
"budget" - How much budget was allocated to this model.
"status" - The return status of training the model with SMAC.
"train_loss" - The loss of the model on the training set.
"balancing_strategy" - The balancing strategy used for data preprocessing.
"start_time" - Time the model began being optimized
"end_time" - Time the model ended being optimized
"data_preprocessors" - The preprocessors used on the data
"feature_preprocessors" - The preprocessors for features types

Parameters

detailed: bool = False

Whether to give detailed information or just a simple overview.

ensemble_only: bool = True

Whether to view only models included in the ensemble or all models trained.

top_k: int or “all” = “all”

How many models to display.

sort_by: str = ‘cost’

What column to sort by. If that column is not present, the sorting defaults to the "model_id" index column.

Defaults to the metric optimized. Sort by the first objective in case of a multi-objective optimization problem

sort_order: “auto” or “ascending” or “descending” = “auto”

include: Optional[str or Iterable[str]]

Items to include, other items not specified will be excluded. The exception is the "model_id" index column which is always included.

If left as None, it will resort back to using the detailed param to decide the columns to include.

Returns

pd.DataFrame: A dataframe of statistics for the models, ordered by sort_by.

predict(X, batch_size=None, n_jobs=1)¶

Predict classes for X.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]

Returns

yarray of shape = [n_samples] or [n_samples, n_labels]: The predicted classes.

predict_proba(X, batch_size=None, n_jobs=1)¶

Predict probabilities of classes for all samples X.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]
batch_sizeint (optional): Number of data points to predict for (predicts all points at once if None.
n_jobsint

Returns

yarray of shape = [n_samples, n_classes] or [n_samples, n_labels]: The predicted class probabilities.

refit(X, y)¶

Refit all models found with fit to new data.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: The targets.

Returns

self

score(X, y)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) wrt. y.

set_params(**params)¶

Set the parameters of this estimator.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

show_models()¶

Returns a dictionary containing dictionaries of ensemble models.

Each model in the ensemble can be accessed by giving its model_id as key.

A model dictionary contains the following:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"cost" - The loss of the model on the validation set.
"ensemble_weight" - The weight given to the model in the ensemble.
"voting_model" - The cv_voting_ensemble model (for ‘cv’ resampling).
"estimators" - List of models (dicts) in cv_voting_ensemble
(‘cv’ resampling).
"data_preprocessor" - The preprocessor used on the data.
"balancing" - The balancing used on the data (for classification).
"feature_preprocessor" - The preprocessor for features types.
"classifier" / "regressor" - The autosklearn wrapped classifier or regressor.
"sklearn_classifier" or "sklearn_regressor" - The sklearn classifier or regressor.

Example

import sklearn.datasets
import sklearn.metrics
import autosklearn.regression

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120
    )
automl.fit(X_train, y_train, dataset_name='diabetes')

ensemble_dict = automl.show_models()
print(ensemble_dict)

Output:

{
    25: {'model_id': 25.0,
         'rank': 1,
         'cost': 0.43667876507897496,
         'ensemble_weight': 0.38,
         'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
         'feature_preprocessor': <autosklearn.pipeline.components....>,
         'regressor': <autosklearn.pipeline.components.regression....>,
         'sklearn_regressor': SGDRegressor(alpha=0.0006517033225329654,...)
        },
    6: {'model_id': 6.0,
        'rank': 2,
        'cost': 0.4550418898836528,
        'ensemble_weight': 0.3,
        'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
        'feature_preprocessor': <autosklearn.pipeline.components....>,
        'regressor': <autosklearn.pipeline.components.regression....>,
        'sklearn_regressor': ARDRegression(alpha_1=0.0003701926442639788,...)
        }...
}

Returns

Dict(int, Any)dictionary of length = number of models in the ensemble: A dictionary of models in the ensemble, where model_id is the key.

sprint_statistics()¶

Return the following statistics of the training result:

dataset name
metric used
best validation score
number of target algorithm runs
number of successful target algorithm runs
number of crashed target algorithm runs
number of target algorithm runs that exceeded the memory limit
number of target algorithm runs that exceeded the time limit

Returns

str

Regression¶

class autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=3600, per_run_time_limit=None, initial_configurations_via_metalearning=25, ensemble_size: int | None = None, ensemble_class: Type[AbstractEnsemble] | Literal['default'] | None = 'default', ensemble_kwargs: Dict[str, Any] | None = None, ensemble_nbest=50, max_models_on_disc=50, seed=1, memory_limit=3072, include: Optional[Dict[str, List[str]]] = None, exclude: Optional[Dict[str, List[str]]] = None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, delete_tmp_folder_after_terminate=True, n_jobs: Optional[int] = None, dask_client: Optional[dask.distributed.Client] = None, disable_evaluator_output=False, get_smac_object_callback=None, smac_scenario_args=None, logging_config=None, metadata_directory=None, metric: Scorer | Sequence[Scorer] | None = None, scoring_functions: Optional[List[Scorer]] = None, load_models: bool = True, get_trials_callback: SMACCallback | None = None, dataset_compression: Union[bool, Mapping[str, Any]] = True, allow_string_features: bool = True, disable_progress_bar: bool = False)[source]¶

This class implements the regression task.

Parameters

time_left_for_this_taskint, optional (default=3600)

Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

per_run_time_limitint, optional (default=1/10 of time_left_for_this_task)

initial_configurations_via_metalearningint, optional (default=25)

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint, optional (default=50)

max_models_on_disc: int, optional (default=50),

seedint, optional (default=1)

Used to seed SMAC. Will determine the output file names.

memory_limitint, optional (3072)

Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB.

Important notes:

If None is provided, no memory limit is set.
In case of multi-processing, memory_limit will be per job, so the total usage is n_jobs x memory_limit.
The memory limit also applies to the ensemble creation process.

includeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are included in search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter exclude.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

include = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

excludeOptional[Dict[str, List[str]]] = None

If None, all possible algorithms are used.

Otherwise, specifies a step and the components that are excluded from search. See /pipeline/components/<step>/* for available components.

Incompatible with parameter include.

Possible Steps:

"data_preprocessor"
"balancing"
"feature_preprocessor"
"classifier" - Only for when when using AutoSklearnClasssifier
"regressor" - Only for when when using AutoSklearnRegressor

Example:

exclude = {
    'classifier': ["random_forest"],
    'feature_preprocessor': ["no_preprocessing"]
}

resampling_strategystr | BaseCrossValidator | _RepeatedSplits | BaseShuffleSplit = “holdout”

How to to handle overfitting, might need to use resampling_strategy_arguments if using "cv" based method or a Splitter object.

Options
- "holdout" - Use a 67:33 (train:test) split
- "cv": perform cross validation, requires “folds” in resampling_strategy_arguments
- "holdout-iterative-fit" - Same as “holdout” but iterative fit where possible
- "cv-iterative-fit": Same as “cv” but iterative fit where possible
- "partial-cv": Same as “cv” but uses intensification.
- BaseCrossValidator - any BaseCrossValidator subclass (found in scikit-learn model_selection module)
- _RepeatedSplits - any _RepeatedSplits subclass (found in scikit-learn model_selection module)
- BaseShuffleSplit - any BaseShuffleSplit subclass (found in scikit-learn model_selection module)

resampling_strategy_argumentsOptional[Dict] = None

Additional arguments for resampling_strategy, this is required if using a cv based strategy. The default arguments if left as None are:

{
    "train_size": 0.67,     # The size of the training set
    "shuffle": True,        # Whether to shuffle before splitting data
    "folds": 5              # Used in 'cv' based resampling strategies
}

If using a custom splitter class, which takes n_splits such as PredefinedSplit, the value of "folds" will be used.

tmp_folderstring, optional (None)

folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

delete_tmp_folder_after_terminate: bool, optional (True)

remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

n_jobsint, optional, experimental

The number of jobs to run in parallel for fit(). -1 means using all processors.

Important notes:

By default, Auto-sklearn uses one core.
Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble.
predict() is not affected by n_jobs (in contrast to most scikit-learn models)
If dask_client is None, a new dask client is created.

dask_clientdask.distributed.Client, optional

User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

disable_evaluator_output: bool or list, optional (False)

'y_optimization' : do not save the predictions for the optimization set, which would later on be used to build an ensemble.
model : do not save any model files

smac_scenario_argsdict, optional (None)

Additional arguments inserted into the scenario of SMAC. See the SMAC documentation for a list of available arguments.

get_smac_object_callbackcallable

logging_configdict, optional (None)

dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

metadata_directorystr, optional (None)

path to the metadata directory. If None, the default directory (autosklearn.metalearning.files) is used.

metricScorer, optional (None)

An instance of autosklearn.metrics.Scorer as created by autosklearn.metrics.make_scorer(). These are the Built-in Metrics. If None is provided, a default metric is selected depending on the task.

scoring_functionsList[Scorer], optional (None)

List of scorers which will be calculated for each pipeline and results will be available via cv_results

load_modelsbool, optional (True)

Whether to load the models after fitting Auto-sklearn.

get_trials_callback: callable

A callable with the following definition.

(smac.SMBO, smac.RunInfo, smac.RunValue, time_left: float) -> bool | None

This will be called after SMAC, the underlying optimizer for autosklearn, finishes training each run.

You can use this to record your own information about the optimization process. You can also use this to enable a early stopping based on some critera.

See the example: Early Stopping And Callbacks.

dataset_compression: Union[bool, Mapping[str, Any]] = True

We compress datasets so that they fit into some predefined amount of memory. Currently this does not apply to dataframes or sparse arrays, only to raw numpy arrays.

NOTE - If using a custom resampling_strategy that relies on specific size or ordering of data, this must be disabled to preserve these properties.

You can disable this entirely by passing False or leave as the default True for configuration below.

{
    "memory_allocation": 0.1,
    "methods": ["precision", "subsample"]
}

You can also pass your own configuration with the same keys and choosing from the available "methods".

The available options are described here:

memory_allocation
By default, we attempt to fit the dataset into 0.1 * memory_limit. This float value can be set with "memory_allocation": 0.1. We also allow for specifying absolute memory in MB, e.g. 10MB is "memory_allocation": 10.

The memory used by the dataset is checked after each reduction method is performed. If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed.

For example, if methods: ["precision", "subsample"] and the "precision" reduction step was enough to make the dataset fit into memory, then the "subsample" reduction step will not be performed.
methods
We provide the following methods for reducing the dataset size. These can be provided in a list and are performed in the order as given.
- "precision" - We reduce floating point precision as follows: * np.float128 -> np.float64 * np.float96 -> np.float64 * np.float64 -> np.float32
- subsample - We subsample data such that it fits directly into the memory allocation memory_allocation * memory_limit. Therefore, this should likely be the last method listed in "methods". Subsampling takes into account classification labels and stratifies accordingly. We guarantee that at least one occurrence of each label is included in the sampled set.

allow_string_features: bool = True

Whether autosklearn should process string features. By default the textpreprocessing is enabled.

disable_progress_bar: bool = False

Whether to disable the progress bar that is displayed in the console while fitting to the training data.

Attributes

cv_results_dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

Not all keys returned by scikit-learn are supported yet.

performance_over_time_pandas.core.frame.DataFrame

A DataFrame containing the models performance over time data. Can be used for plotting directly. Please refer to the example Train and Test Inputs.

fit(X, y, X_test=None, y_test=None, feat_type=None, dataset_name=None)[source]¶

Fit Auto-sklearn to given training set (X, y).

Fit both optimizes the machine learning models and builds an ensemble out of them.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_targets]: The regression target.
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Test data input samples. Will be used to save test predictions for all models. This allows to evaluate the performance of Auto-sklearn over time.
y_testarray-like, shape = [n_samples] or [n_samples, n_targets]: The regression target. Will be used to calculate the test error of all models. This allows to evaluate the performance of Auto-sklearn over time.
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded.
dataset_namestr, optional (default=None): Create nicer output. If None, a string will be determined by the md5 hash of the dataset.

Returns

self

Fit an ensemble to models trained during an optimization process.

All parameters are None by default. If no other value is given, the default values which were set in a call to fit() are used.

Parameters

yarray-like

Target values.

taskint

A constant from the module autosklearn.constants. Determines the task type (binary classification, multiclass classification, multilabel classification or regression).

precisionint

Numeric precision used when loading ensemble data. Can be either 16, 32 or 64.

dataset_namestr

Name of the current data set.

ensemble_sizeint, optional

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement. If set to 0 no ensemble is fit.

Deprecated - will be removed in Auto-sklearn 0.16. Please pass this argument via ensemble_kwargs={"ensemble_size": int} if you want to change the ensemble size for ensemble selection.

ensemble_kwargsDict, optional

Keyword arguments that are passed to the ensemble class upon initialization.

ensemble_nbestint

ensemble_classType[AbstractEnsemble] | “default”, optional (default=”default”)

Class implementing the post-hoc ensemble algorithm. Set to None to disable ensemble building or use class:SingleBest to obtain only use the single best model instead of an ensemble.

If set to “default” it will use EnsembleSelection for single-objective problems and MultiObjectiveDummyEnsemble for multi-objective problems.

metric: Scorer | Sequence[Scorer] | None = None

A metric or list of metrics to score the ensemble with

Returns

self

Fits and individual pipeline configuration and returns the result to the user.

Any additional argument provided is directly passed to the worker exercising the run.

Parameters

X: array-like, shape = (n_samples, n_features): The features used for training
y: array-like: The labels used for training
X_test: Optionalarray-like, shape = (n_samples, n_features): If provided, the testing performance will be tracked on this features.
y_test: array-like: If provided, the testing performance will be tracked on this labels
config: Union[Configuration, Dict[str, Union[str, float, int]]]: A configuration object used to define the pipeline steps. If a dict is passed, a configuration is created based on this dict.
dataset_name: Optional[str]: Name that will be used to tag the Auto-Sklearn run and identify the Auto-Sklearn run
feat_typelist, optional (default=None): List of str of len(X.shape[1]) describing the attribute type. Possible types are Categorical and Numerical. Categorical attributes will be automatically One-Hot encoded. The values used for a categorical attribute must be integers, obtained for example by sklearn.preprocessing.LabelEncoder.

Returns

pipeline: Optional[BasePipeline]: The fitted pipeline. In case of failure while fitting the pipeline, a None is returned.
run_info: RunInFo: A named tuple that contains the configuration launched
run_value: RunValue: A named tuple that contains the result of the run

Returns the Configuration Space object, from which Auto-Sklearn will sample configurations and build pipelines.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: Array with the training features, used to get characteristics like data sparsity
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels
X_testarray-like or sparse matrix of shape = [n_samples, n_features]: Array with features used for performance estimation
y_testarray-like, shape = [n_samples] or [n_samples, n_outputs]: Array with the problem labels for the testing split
dataset_name: Optional[str]: A string to tag the Auto-Sklearn run

get_models_with_weights()¶

Return a list of the final ensemble found by auto-sklearn.

Returns

[(weight_1, model_1), …, (weight_n, model_n)]

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

Returns a pandas table of results for all evaluated models.

Gives an overview of all models trained during the search process along with various statistics about their training.

The available statistics are:

Simple:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"ensemble_weight" - The weight given to the model in the ensemble.
"type" - The type of classifier/regressor used.
"cost" - The loss of the model on the validation set.
"duration" - Length of time the model was optimized for.

Detailed: The detailed view includes all of the simple statistics along with the following.

"config_id" - The id used by SMAC for optimization.
"budget" - How much budget was allocated to this model.
"status" - The return status of training the model with SMAC.
"train_loss" - The loss of the model on the training set.
"balancing_strategy" - The balancing strategy used for data preprocessing.
"start_time" - Time the model began being optimized
"end_time" - Time the model ended being optimized
"data_preprocessors" - The preprocessors used on the data
"feature_preprocessors" - The preprocessors for features types

Parameters

detailed: bool = False

Whether to give detailed information or just a simple overview.

ensemble_only: bool = True

Whether to view only models included in the ensemble or all models trained.

top_k: int or “all” = “all”

How many models to display.

sort_by: str = ‘cost’

What column to sort by. If that column is not present, the sorting defaults to the "model_id" index column.

Defaults to the metric optimized. Sort by the first objective in case of a multi-objective optimization problem

sort_order: “auto” or “ascending” or “descending” = “auto”

include: Optional[str or Iterable[str]]

Items to include, other items not specified will be excluded. The exception is the "model_id" index column which is always included.

If left as None, it will resort back to using the detailed param to decide the columns to include.

Returns

pd.DataFrame: A dataframe of statistics for the models, ordered by sort_by.

predict(X, batch_size=None, n_jobs=1)[source]¶

Predict regression target for X.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]

Returns

yarray of shape = [n_samples] or [n_samples, n_outputs]: The predicted values.

refit(X, y)¶

Refit all models found with fit to new data.

Parameters

Xarray-like or sparse matrix of shape = [n_samples, n_features]: The training input samples.
yarray-like, shape = [n_samples] or [n_samples, n_outputs]: The targets.

Returns

self

score(X, y)¶

Return the coefficient of determination $R^2$ of the prediction.

The coefficient $R^2$ is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred) ** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns

scorefloat: $R^2$ of self.predict(X) wrt. y.

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)¶

Set the parameters of this estimator.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

show_models()¶

Returns a dictionary containing dictionaries of ensemble models.

Each model in the ensemble can be accessed by giving its model_id as key.

A model dictionary contains the following:

"model_id" - The id given to a model by autosklearn.
"rank" - The rank of the model based on it’s "cost".
"cost" - The loss of the model on the validation set.
"ensemble_weight" - The weight given to the model in the ensemble.
"voting_model" - The cv_voting_ensemble model (for ‘cv’ resampling).
"estimators" - List of models (dicts) in cv_voting_ensemble
(‘cv’ resampling).
"data_preprocessor" - The preprocessor used on the data.
"balancing" - The balancing used on the data (for classification).
"feature_preprocessor" - The preprocessor for features types.
"classifier" / "regressor" - The autosklearn wrapped classifier or regressor.
"sklearn_classifier" or "sklearn_regressor" - The sklearn classifier or regressor.

Example

import sklearn.datasets
import sklearn.metrics
import autosklearn.regression

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120
    )
automl.fit(X_train, y_train, dataset_name='diabetes')

ensemble_dict = automl.show_models()
print(ensemble_dict)

Output:

{
    25: {'model_id': 25.0,
         'rank': 1,
         'cost': 0.43667876507897496,
         'ensemble_weight': 0.38,
         'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
         'feature_preprocessor': <autosklearn.pipeline.components....>,
         'regressor': <autosklearn.pipeline.components.regression....>,
         'sklearn_regressor': SGDRegressor(alpha=0.0006517033225329654,...)
        },
    6: {'model_id': 6.0,
        'rank': 2,
        'cost': 0.4550418898836528,
        'ensemble_weight': 0.3,
        'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing....>,
        'feature_preprocessor': <autosklearn.pipeline.components....>,
        'regressor': <autosklearn.pipeline.components.regression....>,
        'sklearn_regressor': ARDRegression(alpha_1=0.0003701926442639788,...)
        }...
}

Returns

Dict(int, Any)dictionary of length = number of models in the ensemble: A dictionary of models in the ensemble, where model_id is the key.

sprint_statistics()¶

Return the following statistics of the training result:

dataset name
metric used
best validation score
number of target algorithm runs
number of successful target algorithm runs
number of crashed target algorithm runs
number of target algorithm runs that exceeded the memory limit
number of target algorithm runs that exceeded the time limit

Returns

str

Metrics¶

autosklearn.metrics.make_scorer(name: str, score_func: Callable, *, optimum: float = 1.0, worst_possible_result: float = 0.0, greater_is_better: bool = True, needs_proba: bool = False, needs_threshold: bool = False, needs_X: bool = False, **kwargs: Any) → autosklearn.metrics.Scorer[source]¶

Make a scorer from a performance metric or loss function.

Factory inspired by scikit-learn which wraps scikit-learn scoring functions to be used in auto-sklearn.

Parameters

name: str: Descriptive name of the metric
score_funccallable: Score function (or loss function) with signature score_func(y, y_pred, **kwargs).
optimumint or float, default=1: The best score achievable by the score function, i.e. maximum in case of scorer function and minimum in case of loss function.
worst_possible_resultint of float, default=0: The worst score achievable by the score function, i.e. minimum in case of scorer function and maximum in case of loss function.
greater_is_betterboolean, default=True: Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.
needs_probaboolean, default=False: Whether score_func requires predict_proba to get probability estimates out of a classifier.
needs_thresholdboolean, default=False: Whether score_func takes a continuous decision certainty. This only works for binary classification.
needs_Xboolean, default=False: Whether score_func requires X in __call__ to compute a metric.
**kwargsadditional arguments: Additional parameters to be passed to score_func.

Returns

scorercallable: Callable object that returns a scalar score; greater is better or set greater_is_better to False.

Built-in Metrics¶

Classification metrics¶

Note: The default autosklearn.metrics.f1, autosklearn.metrics.precision and autosklearn.metrics.recall built-in metrics are applicable only for binary classification. In order to apply them on multilabel and multiclass classification, please use the corresponding metrics with an appropriate averaging mechanism, such as autosklearn.metrics.f1_macro. For more information about how these metrics are used, please read this scikit-learn documentation.

autosklearn.metrics.accuracy¶: alias of accuracy

autosklearn.metrics.balanced_accuracy¶: alias of balanced_accuracy

autosklearn.metrics.f1¶: alias of f1

autosklearn.metrics.f1_macro¶: alias of f1_macro

autosklearn.metrics.f1_micro¶: alias of f1_micro

autosklearn.metrics.f1_samples¶: alias of f1_samples

autosklearn.metrics.f1_weighted¶: alias of f1_weighted

autosklearn.metrics.roc_auc¶: alias of roc_auc

autosklearn.metrics.precision¶: alias of precision

autosklearn.metrics.precision_macro¶: alias of precision_macro

autosklearn.metrics.precision_micro¶: alias of precision_micro

autosklearn.metrics.precision_samples¶: alias of precision_samples

autosklearn.metrics.precision_weighted¶: alias of precision_weighted

autosklearn.metrics.average_precision¶: alias of average_precision

autosklearn.metrics.recall¶: alias of recall

autosklearn.metrics.recall_macro¶: alias of recall_macro

autosklearn.metrics.recall_micro¶: alias of recall_micro

autosklearn.metrics.recall_samples¶: alias of recall_samples

autosklearn.metrics.recall_weighted¶: alias of recall_weighted

autosklearn.metrics.log_loss¶: alias of log_loss

Regression metrics¶

autosklearn.metrics.r2¶: alias of r2

autosklearn.metrics.mean_squared_error¶: alias of mean_squared_error

autosklearn.metrics.mean_absolute_error¶: alias of mean_absolute_error

autosklearn.metrics.median_absolute_error¶: alias of median_absolute_error

Extension Interfaces¶

class autosklearn.pipeline.components.base.AutoSklearnClassificationAlgorithm[source]¶

Provide an abstract interface for classification algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_estimator()[source]¶

Return the underlying estimator object.

Returns

estimatorthe underlying estimator object

predict(X)[source]¶

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters

Xarray-like, shape = (n_samples, n_features)

Returns

array, shape = (n_samples,) or shape = (n_samples, n_labels): Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

predict_proba(X)[source]¶

Predict probabilities.

Parameters

Xarray-like, shape = (n_samples, n_features)

Returns

array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)

class autosklearn.pipeline.components.base.AutoSklearnRegressionAlgorithm[source]¶

Provide an abstract interface for regression algorithms in auto-sklearn.

Make a subclass of this and put it into the directory autosklearn/pipeline/components/regression to make it available.

get_estimator()[source]¶

Return the underlying estimator object.

Returns

estimatorthe underlying estimator object

predict(X)[source]¶

The predict function calls the predict function of the underlying scikit-learn model and returns an array with the predictions.

Parameters

Xarray-like, shape = (n_samples, n_features)

Returns

array, shape = (n_samples,) or shape = (n_samples, n_targets): Returns the predicted values

Notes

Please see the scikit-learn API documentation for further information.

class autosklearn.pipeline.components.base.AutoSklearnPreprocessingAlgorithm[source]¶

Provide an abstract interface for preprocessing algorithms in auto-sklearn.

See Extending auto-sklearn for more information.

get_preprocessor()[source]¶

Return the underlying preprocessor object.

Returns

preprocessorthe underlying preprocessor object

transform(X)[source]¶

The transform function calls the transform function of the underlying scikit-learn model and returns the transformed array.

Parameters

Xarray-like, shape = (n_samples, n_features)

Returns

Xarray: Return the transformed training data

Notes

Please see the scikit-learn API documentation for further information.

Ensembles¶

Single objective¶

class autosklearn.ensembles.EnsembleSelection(task_type: int, metrics: Sequence[Scorer] | Scorer, backend: Backend, ensemble_size: int = 50, bagging: bool = False, mode: str = 'fast', random_state: int | np.random.RandomState | None = None)[source]¶

An ensemble of selected algorithms

Fitting an EnsembleSelection generates an ensemble from the the models generated during the search process. Can be further used for prediction.

Parameters

task_type: int

An identifier indicating which task is being performed.

metrics: Sequence[Scorer] | Scorer

The metric used to evaluate the models. If multiple metrics are passed, ensemble selection only optimizes for the first

backendBackend

Gives access to the backend of Auto-sklearn. Not used by Ensemble Selection.

bagging: bool = False

Whether to use bagging in ensemble selection

mode: str in [‘fast’, ‘slow’] = ‘fast’

Which kind of ensemble generation to use * ‘slow’ - The original method used in Rich Caruana’s ensemble selection. * ‘fast’ - A faster version of Rich Caruanas’ ensemble selection.

random_state: int | RandomState | None = None

The random_state used for ensemble selection.

None - Uses numpy’s default RandomState object
int - Successive calls to fit will produce the same results
RandomState - Truly random, each call to fit will produce different results, even with the same object.

References

Ensemble selection from libraries of models
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew and Alex Ksikes
ICML 2004
https://dl.acm.org/doi/10.1145/1015330.1015432
https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf

fit(base_models_predictions: List[np.ndarray], true_targets: np.ndarray, model_identifiers: List[Tuple[int, int, float]], runs: Sequence[Run], X_data: SUPPORTED_FEAT_TYPES | None = None) → EnsembleSelection[source]¶

Fit an ensemble given predictions of base models and targets.

Ensemble building maximizes performance (in contrast to hyperparameter optimization)!

Parameters

base_models_predictions: np.ndarray

shape = (n_base_models, n_data_points, n_targets) n_targets is the number of classes in case of classification, n_targets is 0 or 1 in case of regression

Can be a list of 2d numpy arrays as well to prevent copying all predictions into a single, large numpy array.

X_datalist-like or sparse data

true_targetsarray of shape [n_targets]

model_identifiersidentifier for each base model.

Can be used for practical text output of the ensemble.

runs: Sequence[Run]

Additional information for each run executed by SMAC that was considered by the ensemble builder.

Returns

self

get_identifiers_with_weights() → List[Tuple[Tuple[int, int, float], float]][source]¶

Return a (identifier, weight)-pairs for all models that were passed to the ensemble builder.

Parameters

modelsdict {identifiermodel object}: The identifiers are the same as the one presented to the fit() method. Models can be used for nice printing.

Returns

List[Tuple[Tuple[int, int, float], float]

get_models_with_weights(models: Dict[Tuple[int, int, float], autosklearn.pipeline.base.BasePipeline]) → List[Tuple[float, autosklearn.pipeline.base.BasePipeline]][source]¶

List of (weight, model) pairs for all models included in the ensemble.

Parameters

modelsdict {identifiermodel object}: The identifiers are the same as the one presented to the fit() method. Models can be used for nice printing.

Returns

List[Tuple[float, BasePipeline]]

get_selected_model_identifiers() → List[Tuple[int, int, float]][source]¶

Return identifiers of models in the ensemble.

This includes models which have a weight of zero!

Returns

list

get_validation_performance() → float[source]¶

Return validation performance of ensemble.

Returns

float

predict(base_models_predictions: Union[numpy.ndarray, List[numpy.ndarray]]) → numpy.ndarray[source]¶

Create ensemble predictions from the base model predictions.

Parameters

base_models_predictionsnp.ndarray: shape = (n_base_models, n_data_points, n_targets) Same as in the fit method.

Returns

np.ndarray

Single model classes¶

These classes wrap a single model to provide a unified interface in Auto-sklearn.

class autosklearn.ensembles.SingleBest(task_type: int, metrics: Sequence[Scorer] | Scorer, backend: Backend, random_state: int | np.random.RandomState | None = None)[source]¶

Ensemble consisting of the single best model.

Parameters

task_type: int: An identifier indicating which task is being performed.
metrics: Sequence[Scorer] | Scorer: The metrics used to evaluate the models.
random_state: int | RandomState | None = None: Not used.
backendBackend: Gives access to the backend of Auto-sklearn. Not used.

fit(base_models_predictions: np.ndarray | list[np.ndarray], true_targets: np.ndarray, model_identifiers: list[tuple[int, int, float]], runs: Sequence[Run], X_data: SUPPORTED_FEAT_TYPES | None = None) → SingleBest[source]¶

Select the single best model.

Parameters

base_models_predictions: np.ndarray

shape = (n_base_models, n_data_points, n_targets) n_targets is the number of classes in case of classification, n_targets is 0 or 1 in case of regression

Can be a list of 2d numpy arrays as well to prevent copying all predictions into a single, large numpy array.

true_targetsarray of shape [n_targets]

model_identifiersidentifier for each base model.

Can be used for practical text output of the ensemble.

runs: Sequence[Run]

Additional information for each run executed by SMAC that was considered by the ensemble builder. Not used.

X_dataarray-like | sparse matrix | None = None

Returns

self

class autosklearn.ensembles.SingleModelEnsemble(task_type: int, metrics: Sequence[Scorer] | Scorer, backend: Backend, model_index: int, random_state: int | np.random.RandomState | None = None)[source]¶

Ensemble consisting of a single model.

This class is used by the MultiObjectiveDummyEnsemble to represent ensembles consisting of a single model, and this class should not be used on its own.

Do not use by yourself!

Parameters

task_type: int: An identifier indicating which task is being performed.
metrics: Sequence[Scorer] | Scorer: The metrics used to evaluate the models.
backendBackend: Gives access to the backend of Auto-sklearn. Not used.
model_indexint: Index of the model that constitutes the ensemble. This index will be used to select the correct predictions that will be passed during fit and predict.
random_state: int | RandomState | None = None: Not used.

Dummy implementation of the fit method.

Actualy work of passing the model index is done in the constructor. This method only stores the identifier of the selected model and computes it’s validation loss.

Parameters

base_models_predictions: np.ndarray

shape = (n_base_models, n_data_points, n_targets) n_targets is the number of classes in case of classification, n_targets is 0 or 1 in case of regression

Can be a list of 2d numpy arrays as well to prevent copying all predictions into a single, large numpy array.

true_targetsarray of shape [n_targets]

model_identifiersidentifier for each base model.

Can be used for practical text output of the ensemble.

runs: Sequence[Run]

Additional information for each run executed by SMAC that was considered by the ensemble builder. Not used.

X_datalist-like | spmatrix | None = None

X data to feed to a metric if it requires it

Returns

self

class autosklearn.ensembles.SingleBestFromRunhistory(task_type: int, metrics: Sequence[Scorer] | Scorer, backend: Backend, run_history: RunHistory, seed: int, random_state: int | np.random.RandomState | None = None)[source]¶

In the case of a crash, this class searches for the best individual model.

Such model is returned as an ensemble of a single object, to comply with the expected interface of an AbstractEnsemble.

Do not use by yourself!

get_identifiers_from_run_history() → list[tuple[int, int, float]][source]¶

Parses the run history, to identify the best performing model

Populates the identifiers attribute, which is used by the backend to access the actual model.

Multi-objective¶

class autosklearn.ensembles.MultiObjectiveDummyEnsemble(task_type: int, metrics: Sequence[Scorer] | Scorer, backend: Backend, random_state: int | np.random.RandomState | None = None)[source]¶

A dummy implementation of a multi-objective ensemble.

Builds ensembles that are individual models on the Pareto front each.

Parameters

task_type: int: An identifier indicating which task is being performed.
metrics: Sequence[Scorer] | Scorer: The metrics used to evaluate the models.
backendBackend: Gives access to the backend of Auto-sklearn. Not used.
random_state: int | RandomState | None = None: Not used.

fit(base_models_predictions: list[np.ndarray], true_targets: np.ndarray, model_identifiers: list[tuple[int, int, float]], runs: Sequence[Run], X_data: SUPPORTED_FEAT_TYPES | None = None) → MultiObjectiveDummyEnsemble[source]¶

Select dummy ensembles given predictions of base models and targets.

Parameters

base_models_predictions: np.ndarray

shape = (n_base_models, n_data_points, n_targets) n_targets is the number of classes in case of classification, n_targets is 0 or 1 in case of regression

Can be a list of 2d numpy arrays as well to prevent copying all predictions into a single, large numpy array.

true_targetsarray of shape [n_targets]

model_identifiersidentifier for each base model.

Can be used for practical text output of the ensemble.

runs: Sequence[Run]

Additional information for each run executed by SMAC that was considered by the ensemble builder. Not used.

X_datalist-like | sparse matrix | None = None

X data to give to the metric if required

Returns

self

get_identifiers_with_weights() → list[tuple[tuple[int, int, float], float]][source]¶

Return a (identifier, weight)-pairs for all models that were passed to the ensemble builder based on the ensemble that is best for the 1st metric.

Parameters

modelsdict {identifiermodel object}: The identifiers are the same as the one presented to the fit() method. Models can be used for nice printing.

Returns

list[tuple[tuple[int, int, float], float]

get_models_with_weights(models: dict[tuple[int, int, float], BasePipeline]) → list[tuple[float, BasePipeline]][source]¶

Return a list of (weight, model) pairs for the ensemble that is best for the 1st metric.

Parameters

modelsdict {identifiermodel object}: The identifiers are the same as the one presented to the fit() method. Models can be used for nice printing.

Returns

list[tuple[float, BasePipeline]]

get_selected_model_identifiers() → list[tuple[int, int, float]][source]¶

Return identifiers of models in the ensemble that is best for the 1st metric.

This includes models which have a weight of zero!

Returns

list

get_validation_performance() → float[source]¶

Validation performance of the ensemble that is best for the 1st metric.

Returns

float

property pareto_set: Sequence[autosklearn.ensembles.abstract_ensemble.AbstractEnsemble]¶

Get a sequence on ensembles that are on the pareto front

Returns

Sequence[AbstractEnsemble]

Raises

SklearnNotFittedError: If fit has not been called and the pareto set does not exist yet

predict(base_models_predictions: np.ndarray | list[np.ndarray]) → np.ndarray[source]¶

Predict using the ensemble which is best for the 1st metric.

Parameters

base_models_predictionsnp.ndarray: shape = (n_base_models, n_data_points, n_targets) Same as in the fit method.

Returns

np.ndarray