Dask runner
smac.runner.dask_runner
#
DaskParallelRunner
#
DaskParallelRunner(
single_worker: AbstractRunner,
patience: int = 5,
dask_client: Client | None = None,
)
Bases: AbstractRunner
Interface to submit and collect a job in a distributed fashion. DaskParallelRunner is
intended to comply with the bridge design pattern. Nevertheless, to reduce the amount of code
within single-vs-parallel implementations, DaskParallelRunner wraps a BaseRunner object which
is then executed in parallel on n_workers.
This class then is constructed by passing an AbstractRunner that implements
a run method, and is capable of doing so in a serial fashion. Next,
this wrapper class uses dask to initialize N number of AbstractRunner that actively wait of a
TrialInfo to produce a RunInfo object.
To be more precise, the work model is then:
- The intensifier dictates "what" to run (a configuration/instance/seed) via a TrialInfo object.
- An abstract runner takes this TrialInfo object and launches the task via
submit_run. In the case of DaskParallelRunner,n_workersreceive a pickle-object ofDaskParallelRunner.single_worker, each with arunmethod coming fromDaskParallelRunner.single_worker.run() - TrialInfo objects are run in a distributed fashion, and their results are available locally to each worker. The
result is collected by
iter_resultsand then passed to SMBO. - Exceptions are also locally available to each worker and need to be collected.
Dask works with Future object which are managed via the DaskParallelRunner.client.
| PARAMETER | DESCRIPTION |
|---|---|
single_worker
|
A runner to run in a distributed fashion. Will be distributed using
TYPE:
|
patience
|
How much to wait for workers (seconds) to be available if one fails.
TYPE:
|
dask_client
|
User-created dask client, which can be used to start a dask cluster and then attach SMAC to it. This will not be closed automatically and will have to be closed manually if provided explicitly. If none is provided (default), a local one will be created for you and closed upon completion.
TYPE:
|
Source code in smac/runner/dask_runner.py
__del__
#
Makes sure that when this object gets deleted, the client is terminated. This is only done if the client was created by the dask runner.
count_available_workers
#
count_available_workers() -> int
Total number of workers available. This number is dynamic as more resources can be allocated.
run_wrapper
#
run_wrapper(
trial_info: TrialInfo,
**dask_data_to_scatter: dict[str, Any]
) -> tuple[TrialInfo, TrialValue]
Wrapper around run() to execute and check the execution of a given config. This function encapsulates common handling/processing, so that run() implementation is simplified.
| PARAMETER | DESCRIPTION |
|---|---|
trial_info
|
Object that contains enough information to execute a configuration run in isolation.
TYPE:
|
dask_data_to_scatter
|
When a user scatters data from their local process to the distributed network, this data is distributed in a round-robin fashion grouping by number of cores. Roughly speaking, we can keep this data in memory and then we do not have to (de-)serialize the data every time we would like to execute a target function with a big dataset. For example, when your target function has a big dataset shared across all the target function, this argument is very useful. |
| RETURNS | DESCRIPTION |
|---|---|
info
|
An object containing the configuration launched.
TYPE:
|
value
|
Contains information about the status/performance of config.
TYPE:
|
Source code in smac/runner/abstract_runner.py
submit_trial
#
This function submits a configuration embedded in a trial_info object, and uses one of
the workers to produce a result locally to each worker.
The execution of a configuration follows this procedure:
. The SMBO/intensifier generates a TrialInfo.#
. SMBO calls submit_trial so that a worker launches the trial_info.#
. submit_trial internally calls self.run(). It does so via a call to run_wrapper which contains common#
code that any run method will otherwise have to implement.
All results will be only available locally to each worker, so the main node needs to collect them.
| PARAMETER | DESCRIPTION |
|---|---|
trial_info
|
An object containing the configuration launched.
TYPE:
|
dask_data_to_scatter
|
When a user scatters data from their local process to the distributed network, this data is distributed in a round-robin fashion grouping by number of cores. Roughly speaking, we can keep this data in memory and then we do not have to (de-)serialize the data every time we would like to execute a target function with a big dataset. For example, when your target function has a big dataset shared across all the target function, this argument is very useful. |