Note
Click here to download the full example code
Multi-Layer Perceptron Using Multiple Epochs¶
Example for optimizing a Multi-Layer Perceptron (MLP) using multiple budgets.
Since we want to take advantage of Multi-Fidelity, the MultiFidelityFacade
is a good choice. By default,
MultiFidelityFacade
internally runs with hyperband as
intensification, which is a combination of an
aggressive racing mechanism and successive halving. Crucially, the target function function
must accept a budget variable, detailing how much fidelity smac wants to allocate to this
configuration.
MLP is a deep neural network, and therefore, we choose epochs as fidelity type. This implies,
that budget
specifies the number of epochs smac wants to allocate. The digits dataset
is chosen to optimize the average accuracy on 5-fold cross validation.
Note
This example uses the MultiFidelityFacade
facade, which is the closest implementation to
BOHB.
[WARNING][successive_halving.py:123] The target function is specified to be non-deterministic, but number of seeds to evaluate are set to 1. Consider increasing `n_seeds` from the intensifier.
[INFO][successive_halving.py:197] Using successive halving with budget type BUDGETS, min budget 5, max budget 25 and eta 3.
[INFO][abstract_initial_design.py:133] Using 5 initial design and 0 additional configurations.
[WARNING][abstract_parallel_intensifier.py:93] Hyperband is executed with 1 worker(s) only. However, your system supports up to 2 workers. Consider increasing the workers in the scenario.
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 1-1 with initial budget 8.
[INFO][successive_halving_worker.py:396] First run and no incumbent provided. Challenger is assumed to be the incumbent.
[INFO][successive_halving_worker.py:613] Challenger (0.0495) is better than incumbent (0.0568) on budget 8.3333.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- batch_size: None -> 155
[INFO][abstract_intensifier.py:367] --- learning_rate_init: None -> 0.006677306766018313
[INFO][abstract_intensifier.py:367] --- n_layer: 4 -> 5
[INFO][abstract_intensifier.py:367] --- n_neurons: 123 -> 56
[INFO][abstract_intensifier.py:367] --- solver: 'lbfgs' -> 'adam'
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 1-2 with initial budget 25.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [25.00 / 25] and 2 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 2-1 with initial budget 8.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:613] Challenger (0.0345) is better than incumbent (0.0406) on budget 25.0000.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- activation: 'tanh' -> 'relu'
[INFO][abstract_intensifier.py:367] --- batch_size: 155 -> 117
[INFO][abstract_intensifier.py:367] --- learning_rate: None -> 'adaptive'
[INFO][abstract_intensifier.py:367] --- learning_rate_init: 0.006677306766018313 -> 0.008282754423021153
[INFO][abstract_intensifier.py:367] --- n_neurons: 56 -> 52
[INFO][abstract_intensifier.py:367] --- solver: 'adam' -> 'sgd'
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 2-2 with initial budget 25.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [25.00 / 25] and 2 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 3-1 with initial budget 8.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:613] Challenger (0.0189) is better than incumbent (0.0345) on budget 25.0000.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- batch_size: 117 -> 91
[INFO][abstract_intensifier.py:367] --- learning_rate_init: 0.008282754423021153 -> 0.006390670908394775
[INFO][abstract_intensifier.py:367] --- n_neurons: 52 -> 183
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 3-2 with initial budget 25.
[INFO][base_smbo.py:260] Configuration budget is exhausted.
[INFO][abstract_facade.py:325] Final Incumbent: {'activation': 'relu', 'n_layer': 5, 'n_neurons': 183, 'solver': 'sgd', 'batch_size': 91, 'learning_rate': 'adaptive', 'learning_rate_init': 0.006390670908394775}
[INFO][abstract_facade.py:326] Estimated cost: 0.01892138656762621
Default cost: 0.4307319715258433
Incumbent cost: 0.025047972763850068
import warnings
import numpy as np
from ConfigSpace import (
Categorical,
Configuration,
ConfigurationSpace,
EqualsCondition,
Float,
InCondition,
Integer,
)
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.neural_network import MLPClassifier
from smac import MultiFidelityFacade, Scenario
__copyright__ = "Copyright 2021, AutoML.org Freiburg-Hannover"
__license__ = "3-clause BSD"
digits = load_digits()
class MLP:
@property
def configspace(self) -> ConfigurationSpace:
# Build Configuration Space which defines all parameters and their ranges.
# To illustrate different parameter types, we use continuous, integer and categorical parameters.
cs = ConfigurationSpace()
n_layer = Integer("n_layer", (1, 5), default=1)
n_neurons = Integer("n_neurons", (8, 256), log=True, default=10)
activation = Categorical("activation", ["logistic", "tanh", "relu"], default="tanh")
solver = Categorical("solver", ["lbfgs", "sgd", "adam"], default="adam")
batch_size = Integer("batch_size", (30, 300), default=200)
learning_rate = Categorical("learning_rate", ["constant", "invscaling", "adaptive"], default="constant")
learning_rate_init = Float("learning_rate_init", (0.0001, 1.0), default=0.001, log=True)
# Add all hyperparameters at once:
cs.add_hyperparameters([n_layer, n_neurons, activation, solver, batch_size, learning_rate, learning_rate_init])
# Adding conditions to restrict the hyperparameter space...
# ... since learning rate is used when solver is 'sgd'.
use_lr = EqualsCondition(child=learning_rate, parent=solver, value="sgd")
# ... since learning rate initialization will only be accounted for when using 'sgd' or 'adam'.
use_lr_init = InCondition(child=learning_rate_init, parent=solver, values=["sgd", "adam"])
# ... since batch size will not be considered when optimizer is 'lbfgs'.
use_batch_size = InCondition(child=batch_size, parent=solver, values=["sgd", "adam"])
# We can also add multiple conditions on hyperparameters at once:
cs.add_conditions([use_lr, use_batch_size, use_lr_init])
return cs
def train(self, config: Configuration, seed: int = 0, budget: int = 25) -> float:
# For deactivated parameters (by virtue of the conditions),
# the configuration stores None-values.
# This is not accepted by the MLP, so we replace them with placeholder values.
lr = config["learning_rate"] if config["learning_rate"] else "constant"
lr_init = config["learning_rate_init"] if config["learning_rate_init"] else 0.001
batch_size = config["batch_size"] if config["batch_size"] else 200
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
classifier = MLPClassifier(
hidden_layer_sizes=[config["n_neurons"]] * config["n_layer"],
solver=config["solver"],
batch_size=batch_size,
activation=config["activation"],
learning_rate=lr,
learning_rate_init=lr_init,
max_iter=int(np.ceil(budget)),
random_state=seed,
)
# Returns the 5-fold cross validation accuracy
cv = StratifiedKFold(n_splits=5, random_state=seed, shuffle=True) # to make CV splits consistent
score = cross_val_score(classifier, digits.data, digits.target, cv=cv, error_score="raise")
return 1 - np.mean(score)
if __name__ == "__main__":
mlp = MLP()
# Define our environment variables
scenario = Scenario(
mlp.configspace,
walltime_limit=40, # After 40 seconds, we stop the hyperparameter optimization
n_trials=200, # Evaluate max 200 different trials
min_budget=5, # Train the MLP using a hyperparameter configuration for at least 5 epochs
max_budget=25, # Train the MLP using a hyperparameter configuration for at most 25 epochs
n_workers=1,
)
# We want to run five random configurations before starting the optimization.
initial_design = MultiFidelityFacade.get_initial_design(scenario, n_configs=5)
# Create our SMAC object and pass the scenario and the train method
smac = MultiFidelityFacade(
scenario,
mlp.train,
initial_design=initial_design,
overwrite=True,
)
# Let's optimize
incumbent = smac.optimize()
# Get cost of default configuration
default_cost = smac.validate(mlp.configspace.get_default_configuration())
print(f"Default cost: {default_cost}")
# Let's calculate the cost of the incumbent
incumbent_cost = smac.validate(incumbent)
print(f"Incumbent cost: {incumbent_cost}")
Total running time of the script: ( 0 minutes 50.483 seconds)