Components¶
Additionally to the basic components mentioned in Getting Started, all other components are explained in the following to get a better picture of SMAC. These components are all used to guide the optimization process and simple changes can influence the results drastically.
Before diving into the components, we shortly want to explain the main Bayesian optimization loop in SMAC. The SMBO receives all instantiated components from the facade and the logic happens here. In general, a while loop is used to ask for the next trial, submit it to the runner, and wait for the runner to finish the evaluation. Since the runner and the SMBO object are decoupled, the while loop continues and asks for even more trials, which also can be submitted to the runner. If the ask method (which is, by the way, processed by the intensifier) returns a wait flag, no further trials will be passed to the runner, and SMAC needs to wait until trials have been evaluated and told to the intensifier.
Also, the limitations like wallclock time and remaining trials are checked, and callbacks are called in the SMBO class.
Surrogate Model¶
The surrogate model is used to approximate the objective function of configurations. In previous versions, the model was referred to Empirical Performance Model (EPM). Mostly, Bayesian optimization is used/associated with Gaussian processes. However, SMAC incorporates random forests as surrogate models, which makes it possible to optimize for higher dimensional and complex spaces.
The data used to train the surrogate model is collected by the runhistory encoder from the runhistory. If budgets are
involved, the highest budget which satisfies self._min_samples
(defaults to 1) in smac.main.smbo is
used. If no budgets are used, all observations are used.
If you are using instances, it is recommended to use instance features. The model is trained on each instance associated with its features. Imagine you have two hyperparameters, two instances and no instance features, the model would be trained on:
HP 1 |
HP 2 |
Objective Value |
---|---|---|
0.1 |
0.8 |
0.5 |
0.1 |
0.8 |
0.75 |
505 |
7 |
2.4 |
505 |
7 |
1.3 |
You can see that the same inputs lead to different objective values because of two instances. If you associate each instance with a feature, you would end-up with the following data points:
HP 1 |
HP 2 |
Instance Feature |
Objective Value |
---|---|---|---|
0.1 |
0.8 |
0 |
0.5 |
0.1 |
0.8 |
1 |
0.75 |
505 |
7 |
0 |
2.4 |
505 |
7 |
1 |
1.3 |
Let me explain how the data are received in detail:
Intensifier requests new configurations via
get_next_configurations
by smac.main.smbo.SMBO collects the data via the runhistory encoder which iterates over the runhistory trials.
The runhistory encoder only collects trials which are in
considered_states
and timeout trials. Also, only the highest budget is considered if budgets are used. In this step, multi-objective values are scalarized using thenormalize_costs
function (usesobjective_bounds
from the runhistory) and the multi-objective algorithm.In the next step, the selected trial objectives are transformed (e.g., log-transformed, depending on the selected encoder).
The hyperparameters might still have inactive values. The model takes care of that after the collected data are passed from the SMBO object to the model.
Acquisition Function¶
Acquisition functions are mathematical techniques that guide how the parameter space should be explored during Bayesian optimization. They use the predicted mean and predicted variance generated by the surrogate model.
The acquisition function is used by the acquisition maximizer (see next section). Otherwise, SMAC provides a bunch of different acquisition functions (Lower Confidence Bound, Expected Improvement, Probability Improvement, Thompson, integrated acquisition functions and prior acquisition functions). We refer to literature for more information about acquisition functions.
Note
The acquisition function calculates the acquisition value for each configuration. However, the configurations are provided by the acquisition maximizer. Therefore, the acquisition maximizer is responsible for receiving the next configurations.
Acquisition Maximizer¶
The acquisition maximizer is a wrapper upon the acquisition function and returns the next configurations. SMAC supports local search, (sorted) random search, local and (sorted) random search, and differential evolution. While local search checks neighbours of the best configurations, random search makes sure to explore the configuration space. When using sorted random search, random configurations are sorted by the value of the acquisition function.
Warning
Pay attention to the number of challengers: If you experience RAM issues or long computational times in the acquisition function, you might lower the number of challengers.
The acquisition maximizer also incorporates the random design. Please see the ChallengerList for more information.
Initial Design¶
The surrogate model needs data to be trained. Therefore, the initial design is used to generate the initial data points. We provide random, latin hypercube, sobol, factorial and default initial designs. The default initial design uses the default configuration from the configuration space and with the factorial initial design, we generate corner points of the configuration space. The sobol sequences are an example of quasi-random low-disrepancy sequences and the latin hypercube design is a statistical method for generating a near-random sample of parameter values from a multidimensional distribution.
Random Design¶
The random design is used in the acquisition maximizer to tell whether the next configuration should be random or sampled from the acquisition function. For example, if we use a random design with a probability of 50%, we have a 50% chance to sample a random configuration and a 50% chance to sample a configuration from the acquisition function (although the acquisition function includes exploration and exploitation trade-off already). This design makes sure that the optimization process is not stuck in a local optimum and we are guaranteed to find the best configuration over time.
In addition to simple probability random design, we also provide annealing and modulus random design.
Intensifier¶
The intensifier compares different configurations based on evaluated trial so far. It decides which configuration should be intensified` or in other words if a configuration is worth to spend more time on (e.g., evaluate another seed pair, evaluate on another instance, or evaluate on a higher budget).
Warning
Always pay attention to max_config_calls
: If this argument is set high, the intensifier might spend a lot of
time on a single configuration. Also, since the default Intensifier
is depending on runtime, reproducibility
is not given unless you set intensify_percentage
to 0.
Depending on the components and arguments, the intensifier tells you which seeds, budgets, and/or instances
are used throughout the optimization process. You can use the methods uses_seeds
, uses_budgets
, and
uses_instances
(directly callable via the facade) to (sanity-)check whether the intensifier uses these arguments.
If you want to know the exact values, use get_target_function_seeds
, get_target_function_budgets
, and
get_target_function_instances
.
Multi-Objective Algorithm¶
The multi-objective algorithm is used to scalarize multi-objective values. The multi-objective algorithm
gets normalized objective values passed and returns a single value. The resulting value (called by the
runhistory encoder) is then used to train the surrogate model.
The runhistory has access to the multi-objective algorithm as well which plays a role in the method get_cost
.
The method get_cost
is used to compare configurations in the intensifier and therefore to determine the
incumbent.
Warning
Depending on the multi-objective algorithm, the incumbent might be ambiguous because there might be multiple
incumbents on the Pareto front. Let’s take ParEGO for example:
Everytime a new configuration is sampled, the objective weights are updated (see runhistory encoder). Therefore,
calling the get_incumbent
method in the runhistory might return a different configuration based on the internal state
of the multi-objective algorithm.
RunHistory¶
The runhistory holds all (un-)evaluated trials of the optimization run. You can use the runhistory to get configs, the incumbent, (min/sum/average) cost of configs, trials of a config, and more. The runhistory encoder iterates over the runhistory to receive data for the surrogate model. The following code shows how to iterate over the runhistory:
smac = HPOFacade(...)
# Iterate over all trials
for trial_info, trial_value in smac.runhistory.items():
# Trial info
config = trial_info.config
instance = trial_info.instance
seed = trial_info.seed
# Trial value
cost = trial_value.cost
time = trial_value.time
status = trial_value.status
starttime = trial_value.starttime
endtime = trial_value.endtime
additional_info = trial_value.additional_info
# Iterate over all configs
for config in smac.runhistory.get_configs():
# Get the cost of all trials of this config
average_cost = smac.runhistory.average_cost(config)
RunHistory Encoder¶
The runhistory encoder is used to encode the runhistory data into a format that can be used by the surrogate model.
Only trials with the status considered_states
and timeout trials are considered. Multi-objective values are
scalarized using the normalize_costs
function (uses objective_bounds
from the runhistory). Afterwards, the
normalized value is processed by the multi-objective algorithm.
Callback¶
Callbacks provide the ability to easily execute code before, inside, and after the Bayesian optimization loop.
To add a callback, you have to inherit from smac.Callback
and overwrite the methods (if needed).
Afterwards, you can pass the callbacks to any facade.
from smac import MultiFidelityFacade, Callback
class CustomCallback(Callback):
def on_start(self, smbo: SMBO) -> None:
pass
def on_end(self, smbo: SMBO) -> None:
pass
def on_iteration_start(self, smbo: SMBO) -> None:
pass
def on_iteration_end(self, smbo: SMBO, info: RunInfo, value: RunValue) -> bool | None:
# We just do a simple printing here
print(info, value)
smac = MultiFidelityFacade(
...
callbacks=[CustomCallback()]
)
smac.optimize()