Quickstart Guide

What is HpBandSter?

HpBandSter (HyperBand on STERoids) implements recently published methods for optimizing hyperparameters of machine learning algorithms. We designed HpBandSter such that it scales from running sequentially on a local machine to running on a distributed system in parallel

One of the implemented algorithms is BOHB, which combines Bayesian Optimization and HyperBand to efficiently search for well performing configurations. Learn more about this method by reading out paper, published at ICML 2018

How to install HpBandSter

HpBandSter can be installed via pip for python3:

pip install hpbandster

If you want to develop on the code you could install it via

git clone git@github.com:automl/HpBandSter.git
cd HpBandSter
python3 setup.py develop --user

Note

We only support Python3 for HpBandSter!

The basic Ingredients

Whether you like to use HpBandSter locally on your machine or on a cluster, the basic setup is always the same. For now, let’s focus on the most important ingredients needed to apply an optimizer to a new problem:

Implementing a Worker
The worker is responsible for evaluating a given model with a single configuration on a single budget at a time.

Defining the Search Space

Next, the parameters being optimized need to be defined. HpBandSter relies on the ConfigSpace package for that.
Picking the Budgets and the Number of Iterations
To get good performance, HpBandSter needs to know meaningful budgets to use. You also have to specify how many iterations the optimizer performs.

1. Implementing a Worker

The Worker is responsible to evaluate a hyperparameter setting and returning the associated loss that is minimized. By deriving from the base class, encoding a new problem consists of implementing two methods: __init__ and compute. The first allows to perform inital computations, e.g. loading the dataset, when the worker is started, while the latter is called repeatedly called during the optimization and evaluates a given configuration yielding the associated loss.
The worker below demonstrates the concept. It implements a simple toy problem where there is a single parameter x in the configuration and we try to minimize it. The function evaluations are corrupted by some Gaussian noise that shrinks as the budget grows.
import numpy
import time

import ConfigSpace as CS
from hpbandster.core.worker import Worker


class MyWorker(Worker):

    def __init__(self, *args, sleep_interval=0, **kwargs):
        super().__init__(*args, **kwargs)

        self.sleep_interval = sleep_interval

    def compute(self, config, budget, **kwargs):
        """
        Simple example for a compute function
        The loss is just a the config + some noise (that decreases with the budget)

        For dramatization, the function can sleep for a given interval to emphasizes
        the speed ups achievable with parallel workers.

        Args:
            config: dictionary containing the sampled configurations by the optimizer
            budget: (float) amount of time/epochs/etc. the model can use to train

        Returns:
            dictionary with mandatory fields:
                'loss' (scalar)
                'info' (dict)
        """

        res = numpy.clip(config['x'] + numpy.random.randn()/budget, config['x']/2, 1.5*config['x'])
        time.sleep(self.sleep_interval)

        return({
                    'loss': float(res),  # this is the a mandatory field to run hyperband
                    'info': res  # can be used for any user-defined information - also mandatory
                })
    
    @staticmethod
    def get_configspace():
        config_space = CS.ConfigurationSpace()

2. The Search Space Definition

Every problem needs a description of the search space to be complete. In HpBandSter, a ConfigurationSpace-object defining all hyperparameters, their ranges, and potential dependencies between them. In our toy example here, the search space consists of a single continuous parameter x between zero and one. For convenience, we attach the configuration space definition to the worker as a static method. This way, the worker’s compute function and its parameters are neatly combined.

class MyWorker(Worker):
    @staticmethod
    def get_configspace():
        config_space = CS.ConfigurationSpace()
        config_space.add_hyperparameter(CS.UniformFloatHyperparameter('x', lower=0, upper=1))
        return(config_space)

Note

We also support integer and categorical hyperparameters. To express dependencies, the ConfigSpace package also also to express conditions and forbidden relations between parameters. For more examples we refer to the documentation of the ConfigSpace or please have a look at the Advanced examples.

3. Meaningful Budgets and Number of Iterations

To take advantage of lower fidelity approximation, i.e. budgets lower than max_budget, those lower accuracy evaluations have to be meaningful. As these budgets can mean very different things (epochs of training a neural network, number of data points to train the model, or number of cross-validation folds to name a few), these have to be user specified. This is done by two parameters, called min_budget and max_budget for all optimizers. For better speed ups, the lower budget should be as small as possible while still being informative. By informative, we mean that the performance is a ok indicator for the loss on higher budgets. It’s hard to be more concrete for the general case. The two budgets are problem dependent and require some domain knowledge.
The number of iterations is usually a much easier parameter to pick. Depending on the optimizer, an iteration requires the computational budget of a couple of function evaluations on max_budget In general the more the better, and things become more complicated when multiple workers run in parallel. For now, the number of iterations simply controls how many configurations are evaluated.

The first toy examples

Let us now take the above worker, its search space and that in a few different settings. Specifically, we will run
  1. locally and sequentially
  2. locally and in parallel (thread based)
  3. locally and in parallel (process based)
  4. distributed in a cluster environment
Each example, showcases how to setup HpBandSter in different environments and highlights specifics for it. Every compute environment is slightly different, but it should be easy to bootstrap from one of the examples and adapt it to any specific needs. The first example slowly introduces the main workflow for any HpBandSter run. the following ones gradually add complexity by including more features.

1. A Local and Sequential Run

We are now ready to look at our first real example to illustrate how HpBandSter is used. Every run consists of the same 5 basic steps which we will now cover.

Step 1: Start a Nameserver

To initiate the communication between the worker(s) and the optimizer, HpBandSter requires a nameserver to be present. This is a small service that keeps track of all running processes and their IP addresses and ports. It is a building block that HpBandster inherits from Pyro4. In this first example, we will run it using the loop back interface with the IP 127.0.0.1. Using the port=None argument, will make it use the default port 9090. The run_id is used to identify individual runs and needs to be given to all other components as well (see below). For now, we just fix it to example1.
NS = hpns.NameServer(run_id='example1', host='127.0.0.1', port=None)
NS.start()

Step 2: Start a Worker

The worker implements the actual problem that is optimized. By deriving your worker from the base worker and implementing the compute method, it can easily be instantiated with all arguments your specific __init__ requires and the additional arguments from the base class. The bare minimum is the location of the nameserver and the run_id.
w = MyWorker(sleep_interval = 0, nameserver='127.0.0.1',run_id='example1')
w.run(background=True)

Step 3: Run an Optimizer

The optimizer decides which configurations are evaluated, and how the budgets are distributed. Besides Random Search, and HyperBand, there is BOHB our own combination of Hyperband and Bayesian Optimization that we will use here. Checkout out the list of available optimizers for more info.
At least, we have to provide the description of the search space, the run_id, the nameserver and the budgets. The optimization starts when the run method is called with the number of iterations as the only mandatory argument.
bohb = BOHB(  configspace = w.get_configspace(),
              run_id = 'example1', nameserver='127.0.0.1',
              min_budget=args.min_budget, max_budget=args.max_budget
           )
res = bohb.run(n_iterations=args.n_iterations)

Step 4: Stop all services

After the run is finished, the services started above need to be shutdown. This ensures that the worker, the nameserver and the master all properly exit and no (daemon) threads keep running afterwards. In particular we shutdown the optimizer (which shuts down all workers) and the nameserver.
bohb.shutdown(shutdown_workers=True)
NS.shutdown()

Step 5: Analysis of the Results

After a run is finished, one might be interested in all kinds of information. HpBandSter offers full access to all evaluated configurations including timing information and potential error messages for failed runs. In this first example, we simply look up the best configuration (called incumbent), count the number of configurations and evaluations, and the total budget spent. For more details, see some of the other examples and the documentation of the Result class.
id2config = res.get_id2config_mapping()
incumbent = res.get_incumbent_id()

print('Best found configuration:', id2config[incumbent]['config'])
print('A total of %i unique configurations where sampled.' % len(id2config.keys()))
print('A total of %i runs where executed.' % len(res.get_all_runs()))
print('Total budget corresponds to %.1f full function evaluations.'%(sum([r.budget for r in res.get_all_runs()])/args.max_budget))
The complete source code for this example can be found here

2. A Local Parallel Run using Threads

Let us now extend this example to start multiple workers, each in a separate thread. This is a useful mode to exploit a multicore CPU system, if the individual workers get around Python’s global interpreter lock. For example, many scikit learn algorithms outsource the heavy duty computations to some C module, making them run truly in parallel even if threaded.
Below, we can instantiate the specified number of workers. To emphasize the effect, we introduce a sleep_interval of one second, which makes every function evaluation take a bit of time. Note the additional id argument that helps separating the individual workers. This is necessary because every worker uses its processes ID which is the same for all threads here.
# Step 2: Start the workers
# Now we can instantiate the specified number of workers. To emphasize the effect,
# we introduce a sleep_interval of one second, which makes every function evaluation
# take a bit of time. Note the additional id argument that helps separating the
# individual workers. This is necessary because every worker uses its processes
When starting the optimizer, we can add the min_n_workers argument to the run methods to make the optimizer wait for all workers to start. This is not mandatory, and workers can be added at any time, but if the timing of the run is essential, this can be used to synchronize all workers right at the start.
# at any time, but if the timing of the run is essential, this can be used to
The source code can be found here here Try running it with different number of workers by changing the –n_worker command line argument.

3. A Local Parallel Run using Different Processes

Before we can go to a distributed system, we shall first extend our toy example to run in different processes. In order to do that, we add the –worker flag
parser.add_argument('--max_budget',   type=float, help='Maximum budget used during the optimization.',    default=243)
which will allow us to run the same script for dedicated workers. Those only have to instantiate the worker class and call its run method, but this time the worker runs in the foreground. After they processed all the configurations and get the shutdown signal from the master the workers simply exit.



if args.worker:
You can download the source code here here Try running the script in three different shells, twice with the –worker flag. To see what is happening, the logging level for this script is set to INFO, so messages from the optimizer and the workers are shown.

4. A Distributed Run on a Cluster with a Shared File System

Example 3 is already close to the setup for a distributed environment. The only things missing are providing a unique run id, looking up the hostname and distributing the nameserver information across all processes. So far, the run id was always hard coded, and the nameserver was running on localhost (127.0.0.1, which was also the hostname) on the default port. We now have to tell all processes which Network Interface Card (NIC) to use and where the nameserver is located. To that end, we introduce three new command line arguments:
parser.add_argument('--run_id', type=str, help='A unique run id for this optimization run. An easy option is to use the job id of the clusters scheduler.')
parser.add_argument('--nic_name',type=str, help='Which network interface to use for communication.')
parser.add_argument('--shared_directory',type=str, help='A directory that is accessible for all processes, e.g. a NFS share.')
The first two are self-explanatory, and we will use a shared directory to distribute the nameserver information to every worker.

Note

This is not the only way to distribute this information, but in our experience almost all clusters offer a shared file system accessible by every compute node. We have therefore implemented an easy solution for this scenario. If that does not cover your use case, you must find another way to distribute the information about the nameserver to all workers. It might be an option then to start a static nameserver, for example on the submission node of the cluster. That way, you can hard code the information into the script.

To find a valid host name we can use the convenience function nic_to_host which looks up a valid hostname for a given NIC.
host = hpns.nic_name_to_host(args.nic_name)
When creating the nameserver, we can provide the working_directory argument to make it store its hostname and port upon start. Both values are also returned by the start method so that we can use them in the master directly.
NS = hpns.NameServer(run_id=args.run_id, host=host, port=0, working_directory=args.shared_directory)
ns_host, ns_port = NS.start()
The workers can then simply retrieve that information by loading it from disc:
if args.worker:
	time.sleep(5)	# short artificial delay to make sure the nameserver is already running
	w = MyWorker(sleep_interval = 0.5,run_id=args.run_id, host=host)
	w.load_nameserver_credentials(working_directory=args.shared_directory)
	w.run(background=False)
	exit(0)
For the master, we can usually afford to run a worker in the background, as most optimizers have very little overhead.
w = MyWorker(sleep_interval = 0.5,run_id=args.run_id, host=host, nameserver=ns_host, nameserver_port=ns_port)
w.run(background=True)
We also provide the host, nameserver, and nameserver_port arguments to the optimizer. Once the run is done, we usually do not want to print out any information, but rather store the result for later analysis. Pickling the object returned by the optimizer’s run is a very easy way of doing that.
with open(os.path.join(args.shared_directory, 'results.pkl'), 'wb') as fh:
	pickle.dump(res, fh)
The full example can be found here. There you will also find an example shell script to submit the program on a cluster running the Sun Grid Engine.