The ToySGD Benchmark¶
This artificial benchmark uses functions like polynomials to test DAC controllers’ ability to control both learning rate and momentum of SGD. At each step until the cutoff, both hyperparameters are updated and one optimization step is taken. As we know the global optimum of the function, the cost is measured as the current regret.
By using function approximation, this benchmark is computationally cheap, so likely a good entry point before tackling the full-sizes SGD or CMA-ES step size benchmarks. It can also serve as a first test whether a DAC method can handle multiple hyperparameters at the same time.
Benchmark for Toysgd.
- class dacbench.benchmarks.toysgd_benchmark.ToySGDBenchmark(config_path=None, config=None)[source]¶
Bases:
AbstractBenchmark
SGD Benchmark with toy functions.
Environment for sgd with toy functions.
- class dacbench.envs.toysgd.ToySGDEnv(config)[source]¶
Bases:
AbstractMADACEnv
Optimize toy functions with SGD + Momentum.
Action: [log_learning_rate, log_momentum] (log base 10) State: Dict with entries remaining_budget, gradient, learning_rate, momentum Reward: negative log regret of current and true function value
An instance can look as follows: ID 0 family polynomial order 2 low -2 high 2 coefficients [ 1.40501053 -0.59899755 1.43337392]
- reset(seed=None, options=None)[source]¶
Reset environment.
- Parameters:
seed (int) – seed
options (dict) – options dict (not used)
Returns
-------
np.array – Environment state
dict – Meta-info
- step(action: float | tuple[float, float]) tuple[dict[str, float], float, bool, dict] [source]¶
Take one step with SGD.
- Parameters:
action (Tuple[float, Tuple[float, float]]) – If scalar, action = (log_learning_rate) If tuple, action = (log_learning_rate, log_momentum)
Returns
-------
Tuple[Dict[str –
- stateDict[str, float]
State with entries: “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
terminated : bool
truncated : bool
info : Dict
float] –
- stateDict[str, float]
State with entries: “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
terminated : bool
truncated : bool
info : Dict
float –
- stateDict[str, float]
State with entries: “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
terminated : bool
truncated : bool
info : Dict
bool –
- stateDict[str, float]
State with entries: “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
terminated : bool
truncated : bool
info : Dict
Dict] –
- stateDict[str, float]
State with entries: “remaining_budget”, “gradient”, “learning_rate”, “momentum”
reward : float
terminated : bool
truncated : bool
info : Dict