Multi-Layer Perceptron Using Multiple Epochs¶

Example for optimizing a Multi-Layer Perceptron (MLP) using multiple budgets. Since we want to take advantage of multi-fidelity, the MultiFidelityFacade is a good choice. By default, MultiFidelityFacade internally runs with hyperband as intensification, which is a combination of an aggressive racing mechanism and Successive Halving. Crucially, the target function must accept a budget variable, detailing how much fidelity smac wants to allocate to this configuration. In this example, we use both SuccessiveHalving and Hyperband to compare the results.

MLP is a deep neural network, and therefore, we choose epochs as fidelity type. This implies, that budget specifies the number of epochs smac wants to allocate. The digits dataset is chosen to optimize the average accuracy on 5-fold cross validation.

Note

This example uses the MultiFidelityFacade facade, which is the closest implementation to BOHB.

[INFO][abstract_facade.py:198] Workers are reduced to 8.
[INFO][abstract_initial_design.py:147] Using 5 initial design configurations and 0 additional configurations.
[INFO][successive_halving.py:164] Successive Halving uses budget type BUDGETS with eta 3, min budget 1, and max budget 25.
[INFO][successive_halving.py:323] Number of configs in stage:
[INFO][successive_halving.py:325] --- Bracket 0: [9, 3, 1]
[INFO][successive_halving.py:327] Budgets in stage:
[INFO][successive_halving.py:329] --- Bracket 0: [2.7777777777777777, 8.333333333333332, 25.0]
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][abstract_intensifier.py:515] Added config 8ea495 as new incumbent because there are no incumbents yet.
[INFO][abstract_intensifier.py:590] Added config 96852f and rejected config 8ea495 as incumbent because it is not better than the incumbents on 1 instances:
[INFO][configspace.py:175] --- activation: 'tanh' -> 'relu'
[INFO][configspace.py:175] --- batch_size: 214 -> 137
[INFO][configspace.py:175] --- learning_rate_init: 0.005599223654063347 -> 0.004991219262649131
[INFO][configspace.py:175] --- n_layer: 4 -> 5
[INFO][configspace.py:175] --- n_neurons: 66 -> 52
[INFO][smbo.py:299] Finished 100 trials.
[INFO][smbo.py:299] Finished 100 trials.
[INFO][smbo.py:307] Configuration budget is exhausted:
[INFO][smbo.py:308] --- Remaining wallclock time: -1.525974988937378
[INFO][smbo.py:309] --- Remaining cpu time: inf
[INFO][smbo.py:310] --- Remaining trials: 391
[INFO][abstract_intensifier.py:590] Added config 035a84 and rejected config 96852f as incumbent because it is not better than the incumbents on 1 instances:
[INFO][configspace.py:175] --- batch_size: 137 -> 259
[INFO][configspace.py:175] --- learning_rate: None -> 'constant'
[INFO][configspace.py:175] --- learning_rate_init: 0.004991219262649131 -> 0.0075638421362735015
[INFO][configspace.py:175] --- n_layer: 5 -> 4
[INFO][configspace.py:175] --- n_neurons: 52 -> 183
[INFO][configspace.py:175] --- solver: 'adam' -> 'sgd'
Default cost (SuccessiveHalving): 0.36672856700711853
Incumbent cost (SuccessiveHalving): 0.021148251315382338
[INFO][abstract_initial_design.py:82] Using `n_configs` and ignoring `n_configs_per_hyperparameter`.
[INFO][abstract_facade.py:198] Workers are reduced to 8.
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44115 instead
  warnings.warn(
[INFO][abstract_initial_design.py:147] Using 5 initial design configurations and 0 additional configurations.
[INFO][successive_halving.py:164] Successive Halving uses budget type BUDGETS with eta 3, min budget 1, and max budget 25.
[INFO][successive_halving.py:323] Number of configs in stage:
[INFO][successive_halving.py:325] --- Bracket 0: [9, 3, 1]
[INFO][successive_halving.py:325] --- Bracket 1: [5, 1]
[INFO][successive_halving.py:325] --- Bracket 2: [3]
[INFO][successive_halving.py:327] Budgets in stage:
[INFO][successive_halving.py:329] --- Bracket 0: [2.7777777777777777, 8.333333333333332, 25.0]
[INFO][successive_halving.py:329] --- Bracket 1: [8.333333333333332, 25.0]
[INFO][successive_halving.py:329] --- Bracket 2: [25.0]
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][smbo.py:299] Finished 0 trials.
[INFO][abstract_intensifier.py:515] Added config 3e854d as new incumbent because there are no incumbents yet.
[INFO][abstract_intensifier.py:590] Added config e27357 and rejected config 3e854d as incumbent because it is not better than the incumbents on 1 instances:
[INFO][configspace.py:175] --- activation: 'relu' -> 'logistic'
[INFO][configspace.py:175] --- batch_size: None -> 103
[INFO][configspace.py:175] --- learning_rate_init: None -> 0.02113934324402315
[INFO][configspace.py:175] --- n_layer: 4 -> 1
[INFO][configspace.py:175] --- n_neurons: 9 -> 19
[INFO][configspace.py:175] --- solver: 'lbfgs' -> 'adam'
[INFO][abstract_intensifier.py:590] Added config 704d44 and rejected config e27357 as incumbent because it is not better than the incumbents on 1 instances:
[INFO][configspace.py:175] --- batch_size: 103 -> 31
[INFO][configspace.py:175] --- learning_rate_init: 0.02113934324402315 -> 0.0038549874043245355
[INFO][configspace.py:175] --- n_neurons: 19 -> 87
[INFO][smbo.py:307] Configuration budget is exhausted:
[INFO][smbo.py:308] --- Remaining wallclock time: -1.3017466068267822
[INFO][smbo.py:309] --- Remaining cpu time: inf
[INFO][smbo.py:310] --- Remaining trials: 424
Default cost (Hyperband): 0.36672856700711853
Incumbent cost (Hyperband): 0.0178087279480037

import warnings

import matplotlib.pyplot as plt
import numpy as np
from ConfigSpace import (
    Categorical,
    Configuration,
    ConfigurationSpace,
    EqualsCondition,
    Float,
    InCondition,
    Integer,
)
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.neural_network import MLPClassifier

from smac import MultiFidelityFacade as MFFacade
from smac import Scenario
from smac.facade import AbstractFacade
from smac.intensifier.hyperband import Hyperband
from smac.intensifier.successive_halving import SuccessiveHalving

__copyright__ = "Copyright 2021, AutoML.org Freiburg-Hannover"
__license__ = "3-clause BSD"


dataset = load_digits()


class MLP:
    @property
    def configspace(self) -> ConfigurationSpace:
        # Build Configuration Space which defines all parameters and their ranges.
        # To illustrate different parameter types, we use continuous, integer and categorical parameters.
        cs = ConfigurationSpace()

        n_layer = Integer("n_layer", (1, 5), default=1)
        n_neurons = Integer("n_neurons", (8, 256), log=True, default=10)
        activation = Categorical("activation", ["logistic", "tanh", "relu"], default="tanh")
        solver = Categorical("solver", ["lbfgs", "sgd", "adam"], default="adam")
        batch_size = Integer("batch_size", (30, 300), default=200)
        learning_rate = Categorical("learning_rate", ["constant", "invscaling", "adaptive"], default="constant")
        learning_rate_init = Float("learning_rate_init", (0.0001, 1.0), default=0.001, log=True)

        # Add all hyperparameters at once:
        cs.add_hyperparameters([n_layer, n_neurons, activation, solver, batch_size, learning_rate, learning_rate_init])

        # Adding conditions to restrict the hyperparameter space...
        # ... since learning rate is only used when solver is 'sgd'.
        use_lr = EqualsCondition(child=learning_rate, parent=solver, value="sgd")
        # ... since learning rate initialization will only be accounted for when using 'sgd' or 'adam'.
        use_lr_init = InCondition(child=learning_rate_init, parent=solver, values=["sgd", "adam"])
        # ... since batch size will not be considered when optimizer is 'lbfgs'.
        use_batch_size = InCondition(child=batch_size, parent=solver, values=["sgd", "adam"])

        # We can also add multiple conditions on hyperparameters at once:
        cs.add_conditions([use_lr, use_batch_size, use_lr_init])

        return cs

    def train(self, config: Configuration, seed: int = 0, budget: int = 25) -> float:
        # For deactivated parameters (by virtue of the conditions),
        # the configuration stores None-values.
        # This is not accepted by the MLP, so we replace them with placeholder values.
        lr = config["learning_rate"] if config["learning_rate"] else "constant"
        lr_init = config["learning_rate_init"] if config["learning_rate_init"] else 0.001
        batch_size = config["batch_size"] if config["batch_size"] else 200

        with warnings.catch_warnings():
            warnings.filterwarnings("ignore")

            classifier = MLPClassifier(
                hidden_layer_sizes=[config["n_neurons"]] * config["n_layer"],
                solver=config["solver"],
                batch_size=batch_size,
                activation=config["activation"],
                learning_rate=lr,
                learning_rate_init=lr_init,
                max_iter=int(np.ceil(budget)),
                random_state=seed,
            )

            # Returns the 5-fold cross validation accuracy
            cv = StratifiedKFold(n_splits=5, random_state=seed, shuffle=True)  # to make CV splits consistent
            score = cross_val_score(classifier, dataset.data, dataset.target, cv=cv, error_score="raise")

        return 1 - np.mean(score)


def plot_trajectory(facades: list[AbstractFacade]) -> None:
    """Plots the trajectory (incumbents) of the optimization process."""
    plt.figure()
    plt.title("Trajectory")
    plt.xlabel("Wallclock time [s]")
    plt.ylabel(facades[0].scenario.objectives)
    plt.ylim(0, 0.4)

    for facade in facades:
        X, Y = [], []
        for item in facade.intensifier.trajectory:
            # Single-objective optimization
            assert len(item.config_ids) == 1
            assert len(item.costs) == 1

            y = item.costs[0]
            x = item.walltime

            X.append(x)
            Y.append(y)

        plt.plot(X, Y, label=facade.intensifier.__class__.__name__)
        plt.scatter(X, Y, marker="x")

    plt.legend()
    plt.show()


if __name__ == "__main__":
    mlp = MLP()

    facades: list[AbstractFacade] = []
    for intensifier_object in [SuccessiveHalving, Hyperband]:
        # Define our environment variables
        scenario = Scenario(
            mlp.configspace,
            walltime_limit=60,  # After 60 seconds, we stop the hyperparameter optimization
            n_trials=500,  # Evaluate max 500 different trials
            min_budget=1,  # Train the MLP using a hyperparameter configuration for at least 5 epochs
            max_budget=25,  # Train the MLP using a hyperparameter configuration for at most 25 epochs
            n_workers=8,
        )

        # We want to run five random configurations before starting the optimization.
        initial_design = MFFacade.get_initial_design(scenario, n_configs=5)

        # Create our intensifier
        intensifier = intensifier_object(scenario, incumbent_selection="highest_budget")

        # Create our SMAC object and pass the scenario and the train method
        smac = MFFacade(
            scenario,
            mlp.train,
            initial_design=initial_design,
            intensifier=intensifier,
            overwrite=True,
        )

        # Let's optimize
        incumbent = smac.optimize()

        # Get cost of default configuration
        default_cost = smac.validate(mlp.configspace.get_default_configuration())
        print(f"Default cost ({intensifier.__class__.__name__}): {default_cost}")

        # Let's calculate the cost of the incumbent
        incumbent_cost = smac.validate(incumbent)
        print(f"Incumbent cost ({intensifier.__class__.__name__}): {incumbent_cost}")

        facades.append(smac)

    # Let's plot it
    plot_trajectory(facades)

Total running time of the script: ( 2 minutes 57.953 seconds)

Multi-Fidelity and Multi-Instances

Stochastic Gradient Descent On Multiple Datasets

SMAC3 Documentation

Multi-Layer Perceptron Using Multiple Epochs¶