The Worker – the muscle of HpBandster¶
-
class
hpbandster.core.worker.
Worker
(run_id, nameserver=None, nameserver_port=None, logger=None, host=None, id=None, timeout=None)[source]¶ The worker is responsible for evaluating a single configuration on a single budget at a time. Communication to the individual workers goes via the nameserver, management of the worker-pool and job scheduling is done by the Dispatcher and jobs are determined by the Master. In distributed systems, each cluster-node runs a Worker-instance. To implement your own worker, overwrite the __init__- and the compute-method. The first allows to perform inital computations, e.g. loading the dataset, when the worker is started, while the latter is repeatedly called during the optimization and evaluates a given configuration yielding the associated loss.
Parameters: - run_id (anything with a __str__ method) – unique id to identify individual HpBandSter run
- nameserver (str) – hostname or IP of the nameserver
- nameserver_port (int) – port of the nameserver
- logger (logging.logger instance) – logger used for debugging output
- host (str) – hostname for this worker process
- id (anything with a __str__method) – if multiple workers are started in the same process, you MUST provide a unique id for each one of them using the id argument.
- timeout (int or float) – specifies the timeout a worker will wait for a new after finishing a computation before shutting down. Towards the end of a long run with multiple workers, this helps to shutdown idling workers. We recommend a timeout that is roughly half the time it would take for the second largest budget to finish. The default (None) means that the worker will wait indefinitely and never shutdown on its own.
-
compute
(config_id, config, budget, working_directory)[source]¶ The function you have to overload implementing your computation.
Parameters: - config_id (tuple) – a triplet of ints that uniquely identifies a configuration. the convention is id = (iteration, budget index, running index) with the following meaning: - iteration: the iteration of the optimization algorithms. E.g, for Hyperband that is one round of Successive Halving - budget index: the budget (of the current iteration) for which this configuration was sampled by the optimizer. This is only nonzero if the majority of the runs fail and Hyperband resamples to fill empty slots, or you use a more ‘advanced’ optimizer. - running index: this is simply an int >= 0 that sort the configs into the order they where sampled, i.e. (x,x,0) was sampled before (x,x,1).
- config (dict) – the actual configuration to be evaluated.
- budget (float) – the budget for the evaluation
- working_directory (str) – a name of a directory that is unique to this configuration. Use this to store intermediate results on lower budgets that can be reused later for a larger budget (for iterative algorithms, for example).
Returns: - needs to return a dictionary with two mandatory entries:
- ’loss’: a numerical value that is MINIMIZED
- ’info’: This can be pretty much any build in python type, e.g. a dict with lists as value. Due to Pyro4 handling the remote function calls, 3rd party types like numpy arrays are not supported!
Return type: dict
-
load_nameserver_credentials
(working_directory, num_tries=60, interval=1)[source]¶ loads the nameserver credentials in cases where master and workers share a filesystem
Parameters: - working_directory (str) – the working directory for the HPB run (see master)
- num_tries (int) – number of attempts to find the file (default 60)
- interval (float) – waiting period between the attempts
-
run
(background=False)[source]¶ Method to start the worker.
Parameters: background (bool) – If set to False (Default). the worker is executed in the current thread. If True, a new daemon thread is created that runs the worker. This is useful in a single worker scenario/when the compute function only simulates work.