Parallel Usage on a single machine¶
Auto-sklearn uses dask.distributed <https://distributed.dask.org/en/latest/index.html>_ for parallel optimization.
This example shows how to start Auto-sklearn to use multiple cores on a single machine. Using this mode, Auto-sklearn starts a dask cluster, manages the workers and takes care of shutting down the cluster once the computation is done. To run Auto-sklearn on multiple machines check the example Parallel Usage: Spawning workers from the command line.
import sklearn.model_selection import sklearn.datasets import sklearn.metrics import autosklearn.classification
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = \ sklearn.model_selection.train_test_split(X, y, random_state=1)
Build and fit a classifier¶
n_jobs_ we must guard the code
if __name__ == '__main__': automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=120, per_run_time_limit=30, tmp_folder='/tmp/autosklearn_parallel_1_example_tmp', n_jobs=4, # Each one of the 4 jobs is allocated 3GB memory_limit=3072, seed=5, ) automl.fit(X_train, y_train, dataset_name='breast_cancer') # Print statistics about the auto-sklearn run such as number of # iterations, number of models failed with a time out. print(automl.sprint_statistics())
auto-sklearn results: Dataset name: breast_cancer Metric: accuracy Best validation score: 0.985816 Number of target algorithm runs: 49 Number of successful target algorithm runs: 48 Number of crashed target algorithm runs: 0 Number of target algorithms that exceeded the time limit: 1 Number of target algorithms that exceeded the memory limit: 0
Total running time of the script: ( 2 minutes 24.442 seconds)