Quickstart Guide¶
What is HpBandSter?¶
HpBandSter (HyperBand on STERoids) implements recently published methods for optimizing hyperparameters of machine learning algorithms. We designed HpBandSter such that it scales from running sequentially on a local machine to running on a distributed system in parallel
One of the implemented algorithms is BOHB, which combines Bayesian Optimization and HyperBand to efficiently search for well performing configurations. Learn more about this method by reading out paper, published at ICML 2018
How to install HpBandSter¶
HpBandSter can be installed via pip for python3:
pip install hpbandster
If you want to develop on the code you could install it via
git clone git@github.com:automl/HpBandSter.git
cd HpBandSter
python3 setup.py develop --user
Note
We only support Python3 for HpBandSter!
The basic Ingredients¶
Whether you like to use HpBandSter locally on your machine or on a cluster, the basic setup is always the same. For now, let’s focus on the most important ingredients needed to apply an optimizer to a new problem:
- Implementing a Worker
- The worker is responsible for evaluating a given model with a single configuration on a single budget at a time.
Next, the parameters being optimized need to be defined. HpBandSter relies on the ConfigSpace package for that.
- Picking the Budgets and the Number of Iterations
- To get good performance, HpBandSter needs to know meaningful budgets to use. You also have to specify how many iterations the optimizer performs.
1. Implementing a Worker¶
Worker
is responsible to evaluate a hyperparameter setting and returning the associated loss that is minimized.
By deriving from the base class
, encoding a new problem consists of implementing two methods: __init__ and compute.
The first allows to perform inital computations, e.g. loading the dataset, when the worker is started, while the latter is called repeatedly called during the optimization and evaluates a given configuration yielding the associated loss.import numpy
import time
import ConfigSpace as CS
from hpbandster.core.worker import Worker
class MyWorker(Worker):
def __init__(self, *args, sleep_interval=0, **kwargs):
super().__init__(*args, **kwargs)
self.sleep_interval = sleep_interval
def compute(self, config, budget, **kwargs):
"""
Simple example for a compute function
The loss is just a the config + some noise (that decreases with the budget)
For dramatization, the function can sleep for a given interval to emphasizes
the speed ups achievable with parallel workers.
Args:
config: dictionary containing the sampled configurations by the optimizer
budget: (float) amount of time/epochs/etc. the model can use to train
Returns:
dictionary with mandatory fields:
'loss' (scalar)
'info' (dict)
"""
res = numpy.clip(config['x'] + numpy.random.randn()/budget, config['x']/2, 1.5*config['x'])
time.sleep(self.sleep_interval)
return({
'loss': float(res), # this is the a mandatory field to run hyperband
'info': res # can be used for any user-defined information - also mandatory
})
@staticmethod
def get_configspace():
config_space = CS.ConfigurationSpace()
2. The Search Space Definition¶
class MyWorker(Worker):
@staticmethod
def get_configspace():
config_space = CS.ConfigurationSpace()
config_space.add_hyperparameter(CS.UniformFloatHyperparameter('x', lower=0, upper=1))
return(config_space)
Note
We also support integer and categorical hyperparameters. To express dependencies, the ConfigSpace package also also to express conditions and forbidden relations between parameters. For more examples we refer to the documentation of the ConfigSpace or please have a look at the Advanced examples.
3. Meaningful Budgets and Number of Iterations¶
The first toy examples¶
- locally and sequentially
- locally and in parallel (thread based)
- locally and in parallel (process based)
- distributed in a cluster environment
1. A Local and Sequential Run¶
Step 1: Start a Nameserver
NS = hpns.NameServer(run_id='example1', host='127.0.0.1', port=None) NS.start()
Step 2: Start a Worker
base worker
and implementing the compute method, it can easily be instantiated with all arguments your specific __init__ requires and the additional arguments from the base class. The bare minimum is the location of the nameserver and the run_id.w = MyWorker(sleep_interval = 0, nameserver='127.0.0.1',run_id='example1') w.run(background=True)
Step 3: Run an Optimizer
Random Search
, and HyperBand
, there is BOHB
our own combination of Hyperband and Bayesian Optimization that we will use here.
Checkout out the list of available optimizers for more info.bohb = BOHB( configspace = w.get_configspace(), run_id = 'example1', nameserver='127.0.0.1', min_budget=args.min_budget, max_budget=args.max_budget ) res = bohb.run(n_iterations=args.n_iterations)
Step 4: Stop all services
bohb.shutdown(shutdown_workers=True) NS.shutdown()
Step 5: Analysis of the Results
Result
class.id2config = res.get_id2config_mapping()
incumbent = res.get_incumbent_id()
print('Best found configuration:', id2config[incumbent]['config'])
print('A total of %i unique configurations where sampled.' % len(id2config.keys()))
print('A total of %i runs where executed.' % len(res.get_all_runs()))
print('Total budget corresponds to %.1f full function evaluations.'%(sum([r.budget for r in res.get_all_runs()])/args.max_budget))
2. A Local Parallel Run using Threads¶
# Step 2: Start the workers
# Now we can instantiate the specified number of workers. To emphasize the effect,
# we introduce a sleep_interval of one second, which makes every function evaluation
# take a bit of time. Note the additional id argument that helps separating the
# individual workers. This is necessary because every worker uses its processes
# at any time, but if the timing of the run is essential, this can be used to
3. A Local Parallel Run using Different Processes¶
parser.add_argument('--max_budget', type=float, help='Maximum budget used during the optimization.', default=243)
if args.worker: