APIs

Main modules

Classification

class autoPyTorch.api.tabular_classification.TabularClassificationTask(seed: int = 1, n_jobs: int = 1, logging_config: Optional[Dict] = None, ensemble_size: int = 50, ensemble_nbest: int = 50, max_models_on_disc: int = 50, temporary_directory: Optional[str] = None, output_directory: Optional[str] = None, delete_tmp_folder_after_terminate: bool = True, delete_output_folder_after_terminate: bool = True, include_components: Optional[Dict] = None, exclude_components: Optional[Dict] = None, resampling_strategy: Union[autoPyTorch.datasets.resampling_strategy.CrossValTypes, autoPyTorch.datasets.resampling_strategy.HoldoutValTypes] = <HoldoutValTypes.holdout_validation: 6>, resampling_strategy_args: Optional[Dict[str, Any]] = None, backend: Optional[autoPyTorch.utils.backend.Backend] = None, search_space_updates: Optional[autoPyTorch.utils.hyperparameter_search_space_update.HyperparameterSearchSpaceUpdates] = None)[source]

Tabular Classification API to the pipelines. Args:

seed (int):

seed to be used for reproducibility.

n_jobs (int), (default=1):

number of consecutive processes to spawn.

logging_config (Optional[Dict]):

specifies configuration for logging, if None, it is loaded from the logging.yaml

ensemble_size (int), (default=50):

Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement.

ensemble_nbest (int), (default=50):

only consider the ensemble_nbest models to build the ensemble

max_models_on_disc (int), (default=50):

maximum number of models saved to disc. Also, controls the size of the ensemble as any additional models will be deleted. Must be greater than or equal to 1.

temporary_directory (str):

folder to store configuration output and log file

output_directory (str):

folder to store predictions for optional test set

delete_tmp_folder_after_terminate (bool):

determines whether to delete the temporary directory, when finished

include_components (Optional[Dict]):

If None, all possible components are used. Otherwise specifies set of components to use.

exclude_components (Optional[Dict]):

If None, all possible components are used. Otherwise specifies set of components not to use. Incompatible with include components

build_pipeline(dataset_properties: Dict[str, Any])autoPyTorch.pipeline.tabular_classification.TabularClassificationPipeline[source]

Build pipeline according to current task and for the passed dataset properties Args:

dataset_properties (Dict[str,Any]):

Returns:

fit(dataset: autoPyTorch.datasets.base_dataset.BaseDataset, budget_config: Dict[str, Union[int, str]] = {}, pipeline_config: Optional[ConfigSpace.configuration_space.Configuration] = None, split_id: int = 0)autoPyTorch.pipeline.base_pipeline.BasePipeline

Fit a pipeline on the given task for the budget. A pipeline configuration can be specified if None, uses default Args:

dataset: (Dataset)

The argument that will provide the dataset splits. It can either be a dictionary with the splits, or the dataset object which can generate the splits based on different restrictions.

budget_config: (Optional[Dict[str, Union[int, str]]])

can contain keys from ‘budget_type’ and the budget specified using ‘epochs’ or ‘runtime’.

split_id: (int) (default=0)

split id to fit on.

pipeline_config: (Optional[Configuration])

configuration to fit the pipeline with. If None, uses default

Returns:

(BasePipeline): fitted pipeline

get_pipeline_options()dict

Returns the current pipeline configuration.

get_search_space(dataset: Optional[autoPyTorch.datasets.base_dataset.BaseDataset] = None)ConfigSpace.configuration_space.ConfigurationSpace

Returns the current search space as ConfigurationSpace object.

predict(X_test: numpy.ndarray, batch_size: Optional[int] = None, n_jobs: int = 1)numpy.ndarray[source]

Generate the estimator predictions. Generate the predictions based on the given examples from the test set. Args: X_test: (np.ndarray)

The test set examples.

Returns:

Array with estimator predictions.

refit(dataset: autoPyTorch.datasets.base_dataset.BaseDataset, budget_config: Dict[str, Union[int, str]] = {}, split_id: int = 0)autoPyTorch.api.base_task.BaseTask

Refit all models found with fit to new data.

Necessary when using cross-validation. During training, autoPyTorch fits each model k times on the dataset, but does not keep any trained model and can therefore not be used to predict for new data points. This methods fits all models found during a call to fit on the data given. This method may also be used together with holdout to avoid only using 66% of the training data to fit the final model. Args:

dataset: (Dataset)

The argument that will provide the dataset splits. It can either be a dictionary with the splits, or the dataset object which can generate the splits based on different restrictions.

budget_config: (Optional[Dict[str, Union[int, str]]])

can contain keys from ‘budget_type’ and the budget specified using ‘epochs’ or ‘runtime’.

split_id: (int)

split id to fit on.

Returns:

self

score(y_pred: numpy.ndarray, y_test: Union[numpy.ndarray, pandas.core.frame.DataFrame])Dict[str, float]

Calculate the score on the test set. Calculate the evaluation measure on the test set. Args: y_pred: (np.ndarray)

The test predictions

y_test: (np.ndarray)

The test ground truth labels.

Returns:

Dict[str, float]: Value of the evaluation metric calculated on the test set.

search(optimize_metric: str, X_train: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray]] = None, y_train: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray]] = None, X_test: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray]] = None, y_test: Optional[Union[List, pandas.core.frame.DataFrame, numpy.ndarray]] = None, dataset_name: Optional[str] = None, budget_type: Optional[str] = None, budget: Optional[float] = None, total_walltime_limit: int = 100, func_eval_time_limit_secs: Optional[int] = None, enable_traditional_pipeline: bool = True, memory_limit: Optional[int] = 4096, smac_scenario_args: Optional[Dict[str, Any]] = None, get_smac_object_callback: Optional[Callable] = None, all_supported_metrics: bool = True, precision: int = 32, disable_file_output: List = [], load_models: bool = True)autoPyTorch.api.base_task.BaseTask[source]

Search for the best pipeline configuration for the given dataset.

Fit both optimizes the machine learning models and builds an ensemble out of them. To disable ensembling, set ensemble_size==0. using the optimizer. Args:

X_train, y_train, X_test, y_test: Union[np.ndarray, List, pd.DataFrame]

A pair of features (X_train) and targets (y_train) used to fit a pipeline. Additionally, a holdout of this pairs (X_test, y_test) can be provided to track the generalization performance of each stage.

optimize_metric (str): name of the metric that is used to

evaluate a pipeline.

budget_type (Optional[str]):

Type of budget to be used when fitting the pipeline. Either ‘epochs’ or ‘runtime’. If not provided, uses the default in the pipeline config (‘epochs’)

budget (Optional[float]):

Budget to fit a single run of the pipeline. If not provided, uses the default in the pipeline config

total_walltime_limit (int), (default=100): Time limit

in seconds for the search of appropriate models. By increasing this value, autopytorch has a higher chance of finding better models.

func_eval_time_limit_secs (int), (default=None): Time limit

for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data. When set to None, this time will automatically be set to total_walltime_limit // 2 to allow enough time to fit at least 2 individual machine learning algorithms. Set to np.inf in case no time limit is desired.

enable_traditional_pipeline (bool), (default=True):

We fit traditional machine learning algorithms (LightGBM, CatBoost, RandomForest, ExtraTrees, KNN, SVM) before building PyTorch Neural Networks. You can disable this feature by turning this flag to False. All machine learning algorithms that are fitted during search() are considered for ensemble building.

memory_limit (Optional[int]), (default=4096): Memory

limit in MB for the machine learning algorithm. autopytorch will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB. If None is provided, no memory limit is set. In case of multi-processing, memory_limit will be per job. This memory limit also applies to the ensemble creation process.

smac_scenario_args (Optional[Dict]): Additional arguments inserted

into the scenario of SMAC. See the [SMAC documentation] (https://automl.github.io/SMAC3/master/options.html?highlight=scenario#scenario)

get_smac_object_callback (Optional[Callable]): Callback function

to create an object of class [smac.optimizer.smbo.SMBO](https://automl.github.io/SMAC3/master/apidoc/smac.optimizer.smbo.html). The function must accept the arguments scenario_dict, instances, num_params, runhistory, seed and ta. This is an advanced feature. Use only if you are familiar with [SMAC](https://automl.github.io/SMAC3/master/index.html).

all_supported_metrics (bool), (default=True): if True, all

metrics supporting current task will be calculated for each pipeline and results will be available via cv_results

precision (int), (default=32): Numeric precision used when loading

ensemble data. Can be either ‘16’, ‘32’ or ‘64’.

disable_file_output (Union[bool, List]): load_models (bool), (default=True): Whether to load the

models after fitting AutoPyTorch.

Returns:

self

set_pipeline_config(**pipeline_config_kwargs: Any)None

Check whether arguments are valid and then sets them to the current pipeline configuration. Args:

**pipeline_config_kwargs: Valid config options include “num_run”, “device”, “budget_type”, “epochs”, “runtime”, “torch_num_threads”, “early_stopping”, “use_tensorboard_logger”, “metrics_during_training”

Returns:

None