Components¶
In addition to the basic components mentioned in Getting Started, all other components are explained in the following paragraphs to give a better picture of SMAC. These components are all used to guide the optimization process and simple changes can influence the results drastically.
Before diving into the components, we shortly want to explain the main Bayesian optimization loop in SMAC. The SMBO receives all instantiated components from the facade and the logic happens here. In general, a while loop is used to ask for the next trial, submit it to the runner, and wait for the runner to finish the evaluation. Since the runner and the SMBO object are decoupled, the while loop continues and asks for even more trials (e.g., in case of multi-threading), which also can be submitted to the runner. If all workers are occupied, SMAC will wait until a new worker is available again. Moreover, limitations like wallclock time and remaining trials are checked in every iteration.
Surrogate Model¶
The surrogate model is used to approximate the objective function of configurations. In previous versions, the model was referred to as the Empirical Performance Model (EPM). Mostly, Bayesian optimization is used/associated with Gaussian processes. However, SMAC also incorporates random forests as surrogate models, which makes it possible to optimize for higher dimensional and complex spaces.
The data used to train the surrogate model is collected by the runhistory encoder (receives data from the runhistory
and transforms it). If budgets are
involved, the highest budget which satisfies min_trials
(defaults to 1) in smac.main.config_selector is
used. If no budgets are used, all observations are used.
If you are using instances, it is recommended to use instance features. The model is trained on each instance associated with its features. Imagine you have two hyperparameters, two instances and no instance features, the model would be trained on:
HP 1 |
HP 2 |
Objective Value |
---|---|---|
0.1 |
0.8 |
0.5 |
0.1 |
0.8 |
0.75 |
505 |
7 |
2.4 |
505 |
7 |
1.3 |
You can see that the same inputs lead to different objective values because of two instances. If you associate each instance with a feature, you would end-up with the following data points:
HP 1 |
HP 2 |
Instance Feature |
Objective Value |
---|---|---|---|
0.1 |
0.8 |
0 |
0.5 |
0.1 |
0.8 |
1 |
0.75 |
505 |
7 |
0 |
2.4 |
505 |
7 |
1 |
1.3 |
The steps to receiving data are as follows:
The intensifier requests new configurations via
next(self.config_generator)
.The config selector collects the data via the runhistory encoder which iterates over the runhistory trials.
The runhistory encoder only collects trials which are in
considered_states
and timeout trials. Also, only the highest budget is considered if budgets are used. In this step, multi-objective values are scalarized using thenormalize_costs
function (usesobjective_bounds
from the runhistory) and the multi-objective algorithm. For example, when ParEGO is used, the scalarization would be different in each training.The selected trial objectives are transformed (e.g., log-transformed, depending on the selected encoder).
The hyperparameters might still have inactive values. The model takes care of that after the collected data are passed to the model.
Acquisition Function¶
Acquisition functions are mathematical techniques that guide how the parameter space should be explored during Bayesian optimization. They use the predicted mean and predicted variance generated by the surrogate model.
The acquisition function is used by the acquisition maximizer (see next section). Otherwise, SMAC provides a bunch of different acquisition functions (Lower Confidence Bound, Expected Improvement, Probability Improvement, Thompson, integrated acquisition functions and prior acquisition functions). We refer to literature for more information about acquisition functions.
Note
The acquisition function calculates the acquisition value for each configuration. However, the configurations are provided by the acquisition maximizer. Therefore, the acquisition maximizer is responsible for receiving the next configurations.
Acquisition Maximizer¶
The acquisition maximizer is a wrapper for the acquisition function. It returns the next configurations. SMAC supports local search, (sorted) random search, local and (sorted) random search, and differential evolution. While local search checks neighbours of the best configurations, random search makes sure to explore the configuration space. When using sorted random search, random configurations are sorted by the value of the acquisition function.
Warning
Pay attention to the number of challengers: If you experience RAM issues or long computational times in the acquisition function, you might lower the number of challengers.
The acquisition maximizer also incorporates the Random Design. Please see the ChallengerList for more information.
Initial Design¶
The surrogate model needs data to be trained. Therefore, the initial design is used to generate the initial data points. We provide random, latin hypercube, sobol, factorial and default initial designs. The default initial design uses the default configuration from the configuration space and with the factorial initial design, we generate corner points of the configuration space. The sobol sequences are an example of quasi-random low-discrepancy sequences and the latin hypercube design is a statistical method for generating a near-random sample of parameter values from a multidimensional distribution.
The initial design configurations are yielded by the config selector first. Moreover, the config selector keeps track of which configurations already have been returned to make sure a configuration is not returned twice.
Random Design¶
The random design is used in the acquisition maximizer to tell whether the next configuration should be random or sampled from the acquisition function. For example, if we use a random design with a probability of 50%, we have a 50% chance to sample a random configuration and a 50% chance to sample a configuration from the acquisition function (although the acquisition function includes exploration and exploitation trade-off already). This design makes sure that the optimization process is not stuck in a local optimum and we are guaranteed to find the best configuration over time.
In addition to simple probability random design, we also provide annealing and modulus random design.
Intensifier¶
The intensifier compares different configurations based on evaluated trial so far. It decides which configuration should be intensified or, in other words, if a configuration is worth to spend more time on (e.g., evaluate another seed pair, evaluate on another instance, or evaluate on a higher budget).
Warning
Always pay attention to max_config_calls
or n_seeds
: If this argument is set high, the intensifier might
spend a lot of time on a single configuration.
Depending on the components and arguments, the intensifier tells you which seeds, budgets, and/or instances
are used throughout the optimization process. You can use the methods uses_seeds
, uses_budgets
, and
uses_instances
(directly callable via the facade) to (sanity-)check whether the intensifier uses these arguments.
Another important fact is that the intensifier keeps track of the current incumbent (a.k.a. the best configuration found so far). In case of multi-objective, multiple incumbents could be found.
All intensifiers support multi-objective, multi-fidelity, and multi-threading:
Multi-Objective: Keeping track of multiple incumbents at once.
Multi-Fidelity: Incorporating instances or budgets.
Multi-Threading: Intensifier are implemented as generators so that calling
next
on the intensifier can be repeated as often as needed. Intensifier are not required to receive results as the results are directly taken from the runhistory.
Note
All intensifiers are working on the runhistory and recognize previous logged trials (e.g., if the user already evaluated something beforehand). Previous configurations (in the best case, also complete trials) are added to the queue/tracker again so that they are integrated into the intensification process.
That means continuing a run as well as incorporating user inputs are natively supported.
Configuration Selector¶
The configuration selector uses the initial design, surrogate model, acquisition maximizer/function, runhistory, runhistory encoder, and random design to select the next configuration. The configuration selector is directly used by the intensifier and is called everytime a new configuration is requested.
The idea behind the configuration selector is straight forward:
Yield the initial design configurations.
Train the surrogate model with the data from the runhistory encoder.
Get the next
retrain_after
configurations from the acquisition function/maximizer and yield them.After all
retrain_after
configurations were yield, go back to step 2.
Note
The configuration selector is a generator and yields configurations. Therefore, the current state of the
selector is saved and when the intensifier calls next
, the selector continues there where it stopped.
Note
Everytime the surrogate model is trained, the multi-objective algorithm is updated via
update_on_iteration_start
.
Multi-Objective Algorithm¶
The multi-objective algorithm is used to scalarize multi-objective values. The multi-objective algorithm gets normalized objective values passed and returns a single value. The resulting value (called by the runhistory encoder) is then used to train the surrogate model.
Warning
Depending on the multi-objective algorithm, the values for the runhistory encoder might differ each time the surrogate model is trained. Let’s take ParEGO for example: Everytime a new configuration is sampled (see ConfigSelector), the objective weights are updated. Therefore, the scalarized values are different and the acquisition maximizer might return completely different configurations.
RunHistory¶
The runhistory holds all (un-)evaluated trials of the optimization run. You can use the runhistory to get (running) configs, (running) trials, trials of a specific config, and more. The runhistory encoder iterates over the runhistory to receive data for the surrogate model. The following code shows how to iterate over the runhistory:
smac = HPOFacade(...)
# Iterate over all trials
for trial_info, trial_value in smac.runhistory.items():
# Trial info
config = trial_info.config
instance = trial_info.instance
budget = trial_info.budget
seed = trial_info.seed
# Trial value
cost = trial_value.cost
time = trial_value.time
status = trial_value.status
starttime = trial_value.starttime
endtime = trial_value.endtime
additional_info = trial_value.additional_info
# Iterate over all configs
for config in smac.runhistory.get_configs():
# Get the cost of all trials of this config
average_cost = smac.runhistory.average_cost(config)
Warning
The intensifier uses a callback to update the incumbent everytime a new trial is added to the runhistory.
RunHistory Encoder¶
The runhistory encoder is used to encode the runhistory data into a format that can be used by the surrogate model.
Only trials with the status considered_states
and timeout trials are considered. Multi-objective values are
scalarized using the normalize_costs
function (uses objective_bounds
from the runhistory). Afterwards, the
normalized value is processed by the multi-objective algorithm.
Callback¶
Callbacks provide the ability to easily execute code before, inside, and after the Bayesian optimization loop.
To add a callback, you have to inherit from smac.Callback
and overwrite the methods (if needed).
Afterwards, you can pass the callbacks to any facade.
from smac import MultiFidelityFacade, Callback
class CustomCallback(Callback):
def on_start(self, smbo: SMBO) -> None:
pass
def on_end(self, smbo: SMBO) -> None:
pass
def on_iteration_start(self, smbo: SMBO) -> None:
pass
def on_iteration_end(self, smbo: SMBO, info: RunInfo, value: RunValue) -> bool | None:
# We just do a simple printing here
print(info, value)
smac = MultiFidelityFacade(
...
callbacks=[CustomCallback()]
)
smac.optimize()