Parallel Usage on a single machine

Auto-sklearn uses dask.distributed <>_ for parallel optimization.

This example shows how to start Auto-sklearn to use multiple cores on a single machine. Using this mode, Auto-sklearn starts a dask cluster, manages the workers and takes care of shutting down the cluster once the computation is done. To run Auto-sklearn on multiple machines check the example Parallel Usage: Spawning workers from the command line.

import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics

import autosklearn.classification

Data Loading

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1

Build and fit a classifier

To use n_jobs_ we must guard the code

if __name__ == "__main__":

    automl = autosklearn.classification.AutoSklearnClassifier(
        # Each one of the 4 jobs is allocated 3GB
    ), y_train, dataset_name="breast_cancer")

    # Print statistics about the auto-sklearn run such as number of
    # iterations, number of models failed with a time out.
auto-sklearn results:
  Dataset name: breast_cancer
  Metric: accuracy
  Best validation score: 0.985816
  Number of target algorithm runs: 43
  Number of successful target algorithm runs: 43
  Number of crashed target algorithm runs: 0
  Number of target algorithms that exceeded the time limit: 0
  Number of target algorithms that exceeded the memory limit: 0

Total running time of the script: ( 2 minutes 1.338 seconds)

Gallery generated by Sphinx-Gallery