Skip to content

Parallelism#

To facilitate parallel execution, SMAC supports executing multiple workers simultaneously via Dask. Using this functionality, splits SMAC into a main process, and DASK workers which handle the execution. The main job handles the optimization process, and coordinates the executor jobs. The executors are queried with the target function and hyperparameter configurations, execute them, and return their result. The executors remain open between different executions.

Note

Please keep in mind that additional workers are only used to evaluate trials. The main thread still orchestrates the optimization process, including training the surrogate model.

Warning

When using multiple workers, SMAC is not reproducible anymore.

Parallelizing Locally#

To utilize parallelism locally, that means running workers on the same machine as the main jobs, specify the n_workers keyword when creating the scenario.

Scenario(model.configspace, n_workers=5)

Parallelizing on SLURM#

To utilize this split of main and execution jobs on a SLURM cluster, SMAC supports manually specifying a Dask client. This allows executing the target function on dedicated SLURM jobs that are necessarily configured with the same hardware requirements,.

Note

While most SLURM clusters behave similarly, the example DASK client might not work for every cluster. For example, some clusters only allow spawning new jobs from the login node.

To configure SMAC properly for each cluster, you need to know the ports which allow communication between main and worker jobs. The dask client is then created as follows:

...
from smac import BlackBoxFacade, Scenario
from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    queue="partition_name",                 # Name of the partition
    cores=4,                                # CPU cores requested
    memory="4 GB",                          # RAM requested
    walltime="00:10:00",                    # Walltime limit for a runner job. 
    processes=1,                            # Number of processes per worker
    log_directory="tmp/smac_dask_slurm",    # Logging directory
    nanny=False,                            # False unless you want to use pynisher
    worker_extra_args=[
        "--worker-port",                    # Worker port range 
        "60010:60100"],                     # Worker port range 
    scheduler_options={
        "port": 60001,                      # Main Job Port
    },
)
cluster.scale(jobs=n_workers)

# Dask creates n_workers jobs on the cluster which stay open.
client = Client(
    address=cluster,
)

#Dask waits for n_workers workers to be created
client.wait_for_workers(n_workers)

# Now we use SMAC to find the best hyperparameters
smac = BlackBoxFacade(
    scenario,                               # Pass scenario
    model.train,                            # Pass Pass target-function
    overwrite=True,                         # Overrides any previous result
    dask_client=client,                     # Pass dask_client
)
incumbent = smac.optimize()

The full example of this code is given in parallelism example.