smac.tae.dask_runner

Classes

DaskParallelRunner(single_worker, n_workers)

Interface to submit and collect a job in a distributed fashion.

class smac.tae.dask_runner.DaskParallelRunner(single_worker, n_workers, patience=5, output_directory=None, dask_client=None)[source]

Bases: smac.tae.base.BaseRunner

Interface to submit and collect a job in a distributed fashion.

DaskParallelRunner is intended to comply with the bridge design pattern.

Nevertheless, to reduce the amount of code within single-vs-parallel implementations, DaskParallelRunner wraps a BaseRunner object which is then executed in parallel on n_workers.

This class then is constructed by passing a BaseRunner that implements a run() method, and is capable of doing so in a serial fashion. Then, this wrapper class called DaskParallelRunner uses dask to initialize N number of BaseRunner that actively wait of a RunInfo to produce a RunValue object.

To be more precise, the work model is then: 1. The smbo.intensifier dictates “what” to run (a configuration/instance/seed)

via a RunInfo object.

  1. a tae_runner takes this RunInfo object and launches the task via tae_runner.submit_run(). In the case of DaskParallelRunner, n_workers receive a pickle-object of DaskParallelRunner.single_worker, each with a run() method coming from DaskParallelRunner.single_worker.run()

  2. RunInfo objects are run in a distributed fashion, an their results are available locally to each worker. Such result is collected by DaskParallelRunner.get_finished_runs() and then passed to the SMBO.

  3. Exceptions are also locally available to each worker and need to be collected.

Dask works with Future object which are managed via the DaskParallelRunner.client.

Parameters
  • single_worker (BaseRunner) – A runner to run in a distributed fashion

  • n_workers (int) – Number of workers to use for distributed run. Will be ignored if dask_client is not None.

  • patience (int) – How much to wait for workers to be available if one fails

  • output_directory (str, optional) – If given, this will be used for the dask worker directory and for storing server information. If a dask client is passed, it will only be used for storing server information as the worker directory must be set by the program/user starting the workers.

  • dask_client (dask.distributed.Client) – User-created dask client, can be used to start a dask cluster and then attach SMAC to it.

results
ta
stats
run_obj
par_factor
cost_for_crash
abort_i_first_run_crash
n_workers
futures
client
__del__()[source]

Make sure that when this object gets deleted, the client is terminated.

This is only done if the client was created by the dask runner.

Return type

None

get_finished_runs()[source]

This method returns any finished configuration, and returns a list with the results of exercising the configurations. This class keeps populating results to self.results until a call to get_finished runs is done. In this case, the self.results list is emptied and all RunValues produced by running run() are returned.

Return type

List[Tuple[RunInfo, RunValue]]

Returns

  • List[RunInfo, RunValue] (A list of RunValues (and respective RunInfo), that is,) – the results of executing a run_info

  • a submitted configuration

num_workers()[source]

Total number of workers available.

This number is dynamic as more resources can be allocated

Return type

int

pending_runs()[source]

Whether or not there are configs still running.

Generally if the runner is serial, launching a run instantly returns it’s result. On parallel runners, there might be pending configurations to complete.

Return type

bool

run(config, instance, cutoff=None, seed=12345, budget=None, instance_specific='0')[source]

This method only complies with the abstract parent class. In the parallel case, we call the single worker run() method.

Parameters
  • config (Configuration) – dictionary param -> value

  • instance (string) – problem instance

  • cutoff (float, optional) – Wallclock time limit of the target algorithm. If no value is provided no limit will be enforced.

  • seed (int) – random seed

  • budget (float, optional) – A positive, real-valued number representing an arbitrary limit to the target algorithm. Handled by the target algorithm internally

  • instance_specific (str) – instance specific information (e.g., domain file or solution)

Return type

Tuple[StatusType, float, float, Dict]

Returns

  • status (enum of StatusType (int)) – {SUCCESS, TIMEOUT, CRASHED, ABORT}

  • cost (float) – cost/regret/quality (float) (None, if not returned by TA)

  • runtime (float) – runtime (None if not returned by TA)

  • additional_info (dict) – all further additional run information

submit_run(run_info)[source]

This function submits a configuration embedded in a run_info object, and uses one of the workers to produce a result locally to each worker.

The execution of a configuration follows this procedure: 1. SMBO/intensifier generates a run_info 2. SMBO calls submit_run so that a worker launches the run_info 3. submit_run internally calls self.run(). it does so via a call to self.run_wrapper() which contains common code that any run() method will otherwise have to implement, like capping check.

Child classes must implement a run() method. All results will be only available locally to each worker, so the main node needs to collect them.

Parameters

run_info (RunInfo) – An object containing the configuration and the necessary data to run it

Return type

None

wait()[source]

SMBO/intensifier might need to wait for runs to finish before making a decision.

This class waits until 1 run completes

Return type

None