smac.tae.dask_runner¶
Classes
|
Interface to submit and collect a job in a distributed fashion. |
- class smac.tae.dask_runner.DaskParallelRunner(single_worker, n_workers, patience=5, output_directory=None, dask_client=None)[source]¶
Bases:
smac.tae.base.BaseRunner
Interface to submit and collect a job in a distributed fashion.
DaskParallelRunner is intended to comply with the bridge design pattern.
Nevertheless, to reduce the amount of code within single-vs-parallel implementations, DaskParallelRunner wraps a BaseRunner object which is then executed in parallel on n_workers.
This class then is constructed by passing a BaseRunner that implements a run() method, and is capable of doing so in a serial fashion. Then, this wrapper class called DaskParallelRunner uses dask to initialize N number of BaseRunner that actively wait of a RunInfo to produce a RunValue object.
To be more precise, the work model is then: 1. The smbo.intensifier dictates “what” to run (a configuration/instance/seed)
via a RunInfo object.
a tae_runner takes this RunInfo object and launches the task via tae_runner.submit_run(). In the case of DaskParallelRunner, n_workers receive a pickle-object of DaskParallelRunner.single_worker, each with a run() method coming from DaskParallelRunner.single_worker.run()
RunInfo objects are run in a distributed fashion, an their results are available locally to each worker. Such result is collected by DaskParallelRunner.get_finished_runs() and then passed to the SMBO.
Exceptions are also locally available to each worker and need to be collected.
Dask works with Future object which are managed via the DaskParallelRunner.client.
- Parameters
single_worker (BaseRunner) – A runner to run in a distributed fashion
n_workers (int) – Number of workers to use for distributed run. Will be ignored if
dask_client
is notNone
.patience (int) – How much to wait for workers to be available if one fails
output_directory (str, optional) – If given, this will be used for the dask worker directory and for storing server information. If a dask client is passed, it will only be used for storing server information as the worker directory must be set by the program/user starting the workers.
dask_client (dask.distributed.Client) – User-created dask client, can be used to start a dask cluster and then attach SMAC to it.
- results¶
- ta¶
- stats¶
- run_obj¶
- par_factor¶
- cost_for_crash¶
- abort_i_first_run_crash¶
- n_workers¶
- futures¶
- client¶
- __del__()[source]¶
Make sure that when this object gets deleted, the client is terminated.
This is only done if the client was created by the dask runner.
- Return type
None
- get_finished_runs()[source]¶
This method returns any finished configuration, and returns a list with the results of exercising the configurations. This class keeps populating results to self.results until a call to get_finished runs is done. In this case, the self.results list is emptied and all RunValues produced by running run() are returned.
- num_workers()[source]¶
Total number of workers available.
This number is dynamic as more resources can be allocated
- Return type
int
- pending_runs()[source]¶
Whether or not there are configs still running.
Generally if the runner is serial, launching a run instantly returns it’s result. On parallel runners, there might be pending configurations to complete.
- Return type
bool
- run(config, instance, cutoff=None, seed=12345, budget=None, instance_specific='0')[source]¶
This method only complies with the abstract parent class. In the parallel case, we call the single worker run() method.
- Parameters
config (Configuration) – dictionary param -> value
instance (string) – problem instance
cutoff (float, optional) – Wallclock time limit of the target algorithm. If no value is provided no limit will be enforced.
seed (int) – random seed
budget (float, optional) – A positive, real-valued number representing an arbitrary limit to the target algorithm. Handled by the target algorithm internally
instance_specific (str) – instance specific information (e.g., domain file or solution)
- Return type
Tuple
[StatusType
,float
,float
,Dict
]- Returns
status (enum of StatusType (int)) – {SUCCESS, TIMEOUT, CRASHED, ABORT}
cost (float) – cost/regret/quality (float) (None, if not returned by TA)
runtime (float) – runtime (None if not returned by TA)
additional_info (dict) – all further additional run information
- submit_run(run_info)[source]¶
This function submits a configuration embedded in a run_info object, and uses one of the workers to produce a result locally to each worker.
The execution of a configuration follows this procedure: 1. SMBO/intensifier generates a run_info 2. SMBO calls submit_run so that a worker launches the run_info 3. submit_run internally calls self.run(). it does so via a call to self.run_wrapper() which contains common code that any run() method will otherwise have to implement, like capping check.
Child classes must implement a run() method. All results will be only available locally to each worker, so the main node needs to collect them.
- Parameters
run_info (RunInfo) – An object containing the configuration and the necessary data to run it
- Return type
None