Multi-Layer Perceptron Using Multiple Epochs¶

Example for optimizing a Multi-Layer Perceptron (MLP) using multiple budgets. Since we want to take advantage of Multi-Fidelity, the MultiFidelityFacade is a good choice. By default, MultiFidelityFacade internally runs with hyperband as intensification, which is a combination of an aggressive racing mechanism and successive halving. Crucially, the target function function must accept a budget variable, detailing how much fidelity smac wants to allocate to this configuration.

MLP is a deep neural network, and therefore, we choose epochs as fidelity type. This implies, that budget specifies the number of epochs smac wants to allocate. The digits dataset is chosen to optimize the average accuracy on 5-fold cross validation.

Note

This example uses the MultiFidelityFacade facade, which is the closest implementation to BOHB.

[WARNING][successive_halving.py:123] The target function is specified to be non-deterministic, but number of seeds to evaluate are set to 1. Consider increasing `n_seeds` from the intensifier.
[INFO][successive_halving.py:197] Using successive halving with budget type BUDGETS, min budget 5, max budget 25 and eta 3.
[INFO][abstract_initial_design.py:133] Using 5 initial design and 0 additional configurations.
[WARNING][abstract_parallel_intensifier.py:93] Hyperband is executed with 1 worker(s) only. However, your system supports up to 2 workers. Consider increasing the workers in the scenario.
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 1-1 with initial budget 8.
[INFO][successive_halving_worker.py:396] First run and no incumbent provided. Challenger is assumed to be the incumbent.
[INFO][successive_halving_worker.py:613] Challenger (0.0495) is better than incumbent (0.0568) on budget 8.3333.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- batch_size: None -> 155
[INFO][abstract_intensifier.py:367] --- learning_rate_init: None -> 0.006677306766018313
[INFO][abstract_intensifier.py:367] --- n_layer: 4 -> 5
[INFO][abstract_intensifier.py:367] --- n_neurons: 123 -> 56
[INFO][abstract_intensifier.py:367] --- solver: 'lbfgs' -> 'adam'
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 1-2 with initial budget 25.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [25.00 / 25] and 2 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 2-1 with initial budget 8.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:613] Challenger (0.0345) is better than incumbent (0.0406) on budget 25.0000.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- activation: 'tanh' -> 'relu'
[INFO][abstract_intensifier.py:367] --- batch_size: 155 -> 117
[INFO][abstract_intensifier.py:367] --- learning_rate: None -> 'adaptive'
[INFO][abstract_intensifier.py:367] --- learning_rate_init: 0.006677306766018313 -> 0.008282754423021153
[INFO][abstract_intensifier.py:367] --- n_neurons: 56 -> 52
[INFO][abstract_intensifier.py:367] --- solver: 'adam' -> 'sgd'
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 2-2 with initial budget 25.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [25.00 / 25] and 2 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 3-1 with initial budget 8.
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-1 with budget [8.33 / 25] and 3 evaluated challenger(s).
[INFO][successive_halving_worker.py:613] Challenger (0.0189) is better than incumbent (0.0345) on budget 25.0000.
[INFO][abstract_intensifier.py:364] Changes in incumbent:
[INFO][abstract_intensifier.py:367] --- batch_size: 117 -> 91
[INFO][abstract_intensifier.py:367] --- learning_rate_init: 0.008282754423021153 -> 0.006390670908394775
[INFO][abstract_intensifier.py:367] --- n_neurons: 52 -> 183
[INFO][successive_halving_worker.py:245] Finished Successive Halving iteration-step 1-2 with budget [25.00 / 25] and 1 evaluated challenger(s).
[INFO][hyperband_worker.py:165] Finished Hyperband iteration-step 3-2 with initial budget 25.
[INFO][base_smbo.py:260] Configuration budget is exhausted.
[INFO][abstract_facade.py:325] Final Incumbent: {'activation': 'relu', 'n_layer': 5, 'n_neurons': 183, 'solver': 'sgd', 'batch_size': 91, 'learning_rate': 'adaptive', 'learning_rate_init': 0.006390670908394775}
[INFO][abstract_facade.py:326] Estimated cost: 0.01892138656762621
Default cost: 0.4307319715258433
Incumbent cost: 0.025047972763850068

import warnings

import numpy as np
from ConfigSpace import (
    Categorical,
    Configuration,
    ConfigurationSpace,
    EqualsCondition,
    Float,
    InCondition,
    Integer,
)
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.neural_network import MLPClassifier

from smac import MultiFidelityFacade, Scenario

__copyright__ = "Copyright 2021, AutoML.org Freiburg-Hannover"
__license__ = "3-clause BSD"


digits = load_digits()


class MLP:
    @property
    def configspace(self) -> ConfigurationSpace:
        # Build Configuration Space which defines all parameters and their ranges.
        # To illustrate different parameter types, we use continuous, integer and categorical parameters.
        cs = ConfigurationSpace()

        n_layer = Integer("n_layer", (1, 5), default=1)
        n_neurons = Integer("n_neurons", (8, 256), log=True, default=10)
        activation = Categorical("activation", ["logistic", "tanh", "relu"], default="tanh")
        solver = Categorical("solver", ["lbfgs", "sgd", "adam"], default="adam")
        batch_size = Integer("batch_size", (30, 300), default=200)
        learning_rate = Categorical("learning_rate", ["constant", "invscaling", "adaptive"], default="constant")
        learning_rate_init = Float("learning_rate_init", (0.0001, 1.0), default=0.001, log=True)

        # Add all hyperparameters at once:
        cs.add_hyperparameters([n_layer, n_neurons, activation, solver, batch_size, learning_rate, learning_rate_init])

        # Adding conditions to restrict the hyperparameter space...
        # ... since learning rate is used when solver is 'sgd'.
        use_lr = EqualsCondition(child=learning_rate, parent=solver, value="sgd")
        # ... since learning rate initialization will only be accounted for when using 'sgd' or 'adam'.
        use_lr_init = InCondition(child=learning_rate_init, parent=solver, values=["sgd", "adam"])
        # ... since batch size will not be considered when optimizer is 'lbfgs'.
        use_batch_size = InCondition(child=batch_size, parent=solver, values=["sgd", "adam"])

        # We can also add multiple conditions on hyperparameters at once:
        cs.add_conditions([use_lr, use_batch_size, use_lr_init])

        return cs

    def train(self, config: Configuration, seed: int = 0, budget: int = 25) -> float:
        # For deactivated parameters (by virtue of the conditions),
        # the configuration stores None-values.
        # This is not accepted by the MLP, so we replace them with placeholder values.
        lr = config["learning_rate"] if config["learning_rate"] else "constant"
        lr_init = config["learning_rate_init"] if config["learning_rate_init"] else 0.001
        batch_size = config["batch_size"] if config["batch_size"] else 200

        with warnings.catch_warnings():
            warnings.filterwarnings("ignore")

            classifier = MLPClassifier(
                hidden_layer_sizes=[config["n_neurons"]] * config["n_layer"],
                solver=config["solver"],
                batch_size=batch_size,
                activation=config["activation"],
                learning_rate=lr,
                learning_rate_init=lr_init,
                max_iter=int(np.ceil(budget)),
                random_state=seed,
            )

            # Returns the 5-fold cross validation accuracy
            cv = StratifiedKFold(n_splits=5, random_state=seed, shuffle=True)  # to make CV splits consistent
            score = cross_val_score(classifier, digits.data, digits.target, cv=cv, error_score="raise")

        return 1 - np.mean(score)


if __name__ == "__main__":
    mlp = MLP()

    # Define our environment variables
    scenario = Scenario(
        mlp.configspace,
        walltime_limit=40,  # After 40 seconds, we stop the hyperparameter optimization
        n_trials=200,  # Evaluate max 200 different trials
        min_budget=5,  # Train the MLP using a hyperparameter configuration for at least 5 epochs
        max_budget=25,  # Train the MLP using a hyperparameter configuration for at most 25 epochs
        n_workers=1,
    )

    # We want to run five random configurations before starting the optimization.
    initial_design = MultiFidelityFacade.get_initial_design(scenario, n_configs=5)

    # Create our SMAC object and pass the scenario and the train method
    smac = MultiFidelityFacade(
        scenario,
        mlp.train,
        initial_design=initial_design,
        overwrite=True,
    )

    # Let's optimize
    incumbent = smac.optimize()

    # Get cost of default configuration
    default_cost = smac.validate(mlp.configspace.get_default_configuration())
    print(f"Default cost: {default_cost}")

    # Let's calculate the cost of the incumbent
    incumbent_cost = smac.validate(incumbent)
    print(f"Incumbent cost: {incumbent_cost}")

Total running time of the script: ( 0 minutes 50.483 seconds)

Multi-Fidelity and Multi-Instances

Stochastic Gradient Descent On Multiple Datasets

SMAC3 Documentation

Multi-Layer Perceptron Using Multiple Epochs¶