arlbench.autorl package

Submodules

arlbench.autorl.autorl_env module

Automated Reinforcement Learning Environment.

class arlbench.autorl.autorl_env.AutoRLEnv(config=None)[source]

Bases: Env

Automated Reinforcement Learning (gynmasium-like) Environment.

With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.

In each step, one iteration of training is performed with the current hyperparameter configuration (= action).

ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}
property action_space: Space

Returns the hyperparameter configuration spaces as gymnasium space.

Returns:

Hyperparameter configuration space.

Return type:

gymnasium.spaces.Space

property checkpoints: list[str]

Returns a list of created checkpoints for this AutoRL environment.

Returns:

List of checkpoint paths.

Return type:

list[str]

property config: dict

Returns the AutoRL configuration.

Returns:

AutoRL configuration.

Return type:

dict

property config_space: ConfigurationSpace

Returns the hyperparameter configuration spaces as ConfigSpace.

Returns:

Hyperparameter configuration space.

Return type:

ConfigurationSpace

eval(num_eval_episodes)[source]

Evaluates the algorithm using its current training state.

Parameters:

num_eval_episodes (int) – Number of evaluation episodes to run.

Returns:

Array of evaluation return for each episode.

Return type:

np.ndarray

property hpo_config: Configuration

Returns the current hyperparameter configuration stored in the AutoRL environment..

Returns:

Hyperparameter configuration.

Return type:

Configuration

property objectives: list[str]

Returns configured objectives.

Returns:

List of objectives.

Return type:

list[str]

property observation_space: Space

Returns a gymnasium spaces of state features (observations).

Returns:

Gynasium space.

Return type:

gymnasium.spaces.Space

reset()[source]

Resets the AutoRL environment and current algorithm state.

Returns:

Empty observation and state information.

Return type:

tuple[ObservationT, InfoT]

step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]

Performs one iteration of RL training.

Parameters:
  • action (Configuration | dict) – Hyperparameter configuration to use for training.

  • n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.

  • n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.

  • n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.

  • seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.

Raises:

ValueError – Error is raised if step() is called before reset() was called.

Returns:

State information, objectives, terminated, truncated, additional information.

Return type:

tuple[ObservationT, ObjectivesT, bool, bool, InfoT]

arlbench.autorl.checkpointing module

Contains all checkpointing-related methods for the AutoRL environment.

class arlbench.autorl.checkpointing.Checkpointer[source]

Bases: object

Contains all checkpointing-related methods for the AutoRL environment.

MRP_FILE = 'max_recorded_priority.npy'
NODES_FILE = 'nodes.npy'
SCALARS_FILE = 'scalars.json'
static load(checkpoint_path, algorithm_state)[source]

Loads a AutoRL environment checkpoint.

Parameters:
  • checkpoint_path (str) – Path of the checkpoint.

  • algorithm_state (AlgorithmState) – Current algorithm state, certain attributes will be overriden by checkpoint.

Returns:

Common AutoRL environment attributes as well as dictionary to restored algorithm state: (hp_config, c_step, c_episode), algorithm_kw_args

Return type:

tuple[tuple[dict[str, Any], int, int], dict]

static load_buffer(dummy_buffer_state, priority_state_path, buffer_dir, vault_uuid)[source]

Loads the buffer state from a checkpoint.

Parameters:
  • dummy_buffer_state (PrioritisedTrajectoryBufferState) – Dummy buffer state. This is required to know the size/data types of the buffer.

  • priority_state_path (str) – Path where the priorities are stored.

  • buffer_dir (str) – The directory where the buffer data is stored.

  • vault_uuid (str) – The unique ID of the vault containing the buffer data.

Returns:

The buffer state that was loaded from disk.

Return type:

PrioritisedTrajectoryBufferState

static save(algorithm, algorithm_state, autorl_config, hp_config, done, c_episode, c_step, train_result, tag=None)[source]

Saves the current state of a AutoRL environment.

Parameters:
  • algorithm (str) – Name of the algorithm.

  • algorithm_state (AlgorithmState) – Current algorithm state.

  • autorl_config (dict) – AutoRL configuration.

  • hp_config (Configuration) – Hyperparameter configuration of the algorithm.

  • done (bool) – Whether the environment is done.

  • c_episode (int) – Current episode of the AutoRL environment.

  • c_step (int) – Current step of the AutoRL environment.

  • train_result (TrainResult | None) – Last training result of the algorithm.

  • tag (str | None, optional) – Checkpoint tag which is appended to the checkpoint name. Defaults to None.

Returns:

Path of the checkpoint.

Return type:

str

static save_buffer(buffer_state, checkpoint_dir, checkpoint_name)[source]

Saves the buffer state of an algorithm.

Parameters:
  • buffer_state (TrajectoryBufferState | PrioritisedTrajectoryBufferState) – Buffer state.

  • checkpoint_dir (str) – Checkpoint directory.

  • checkpoint_name (str) – Checkpoint name.

Returns:

Dictionary containing the identifiers of single parts of the buffer. Required to load the checkpoint.

Return type:

dict

arlbench.autorl.objectives module

This module contains the objectives for the AutoRL environment.

class arlbench.autorl.objectives.Emissions(*args, **kwargs)[source]

Bases: Objective

Emissions objective for the AutoRL environment. It measures the emissions during the training using code carbon.

KEY: str = 'emissions'
RANK: int = 1
static __call__(train_func, objectives, optimize_objectives)[source]

Wraps the training function with the emissions calculation.

Return type:

Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]

Returns a dictionary containing the specification of the objective.

Return type:

dict

class arlbench.autorl.objectives.Objective(*args, **kwargs)[source]

Bases: ABC

An abstract optimization objective for the AutoRL environment.

It can be wrapped around the training function to calculate the objective. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.

KEY: str
RANK: int
abstract static __call__(train_func, objectives, optimize_objectives)[source]

Wraps the training function with the objective calculation.

Parameters:
  • train_func (TrainFunc) – Training function to wrap.

  • objectives (dict) – Dictionary to store objective.

  • optimize_objectives (str) – Whether to minimize/maximize the objectve.

Returns:

Training function.

Return type:

TrainFunc

__lt__(other)[source]

Implements “less-than” comparison between two objectives. Used for sorting based on objective rank.

Parameters:

other (Objective) – Other Objective to compare to.

Returns:

Whether this Objective is less than the other Objective.

Return type:

bool

static __new__(cls, *args, **kwargs)[source]

Creates a new instance of this objective and directly wraps the train function.

This is done by first creating an object and subsequently calling self.__call__().

Returns:

Wrapped training function.

Return type:

TrainFunc

abstract static get_spec()[source]

Returns a dictionary containing the specification of the objective.

Returns:

Specification.

Return type:

dict

class arlbench.autorl.objectives.RewardMean(*args, **kwargs)[source]

Bases: Objective

Reward objective for the AutoRL environment. It measures the mean of the last evaluation rewards.

KEY: str = 'reward_mean'
RANK: int = 2
static __call__(train_func, objectives, optimize_objectives)[source]

Wraps the training function with the reward mean calculation.

Return type:

Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]

Returns a dictionary containing the specification of the objective.

Return type:

dict

class arlbench.autorl.objectives.RewardStd(*args, **kwargs)[source]

Bases: Objective

Reward objective for the AutoRL environment. It measures the standard deviation of the last evaluation rewards.

KEY: str = 'reward_std'
RANK: int = 2
static __call__(train_func, objectives, optimize_objectives)[source]

Wraps the training function with the reward standard deviation calculation.

Return type:

Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]

Returns a dictionary containing the specification of the objective.

Return type:

dict

class arlbench.autorl.objectives.Runtime(*args, **kwargs)[source]

Bases: Objective

Runtime objective for the AutoRL environment. It measures the total training runtime.

KEY: str = 'runtime'
RANK: int = 0
static __call__(train_func, objectives, optimize_objectives)[source]

Wraps the training function with the runtime calculation.

Return type:

Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]

Returns a dictionary containing the specification of the objective.

Return type:

dict

arlbench.autorl.state_features module

State features for the AutoRL environment.

class arlbench.autorl.state_features.GradInfo(*args, **kwargs)[source]

Bases: StateFeature

Gradient information state feature for the AutoRL environment. It contains the grad norm during training.

KEY: str = 'grad_info'
static __call__(train_func, state_features)[source]

Wraps the training function with the gradient information calculation.

Return type:

Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_state_space()[source]

Returns state space.

Return type:

Space

class arlbench.autorl.state_features.StateFeature(*args, **kwargs)[source]

Bases: ABC

An abstract state features for the AutoRL environment.

It can be wrapped around the training function to calculate the state features. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.

KEY: str
abstract static __call__(train_func, state_features)[source]

Wraps the training function with the state feature calculation.

Parameters:
  • train_func (TrainFunc) – Training function to wrap.

  • state_features (dict) – Dictionary to store state features.

Returns:

Wrapped training function.

Return type:

TrainFunc

static __new__(cls, *args, **kwargs)[source]

Creates a new instance of this state feature and directly wraps the train function.

This is done by first creating an object and subsequently calling self.__call__().

Returns:

Wrapped training function.

Return type:

TrainFunc

abstract static get_state_space()[source]

Returns a dictionary containing the specification of the state feature.

Returns:

Specification.

Return type:

dict

Module contents

class arlbench.autorl.AutoRLEnv(config=None)[source]

Bases: Env

Automated Reinforcement Learning (gynmasium-like) Environment.

With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.

In each step, one iteration of training is performed with the current hyperparameter configuration (= action).

ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}
property action_space: Space

Returns the hyperparameter configuration spaces as gymnasium space.

Returns:

Hyperparameter configuration space.

Return type:

gymnasium.spaces.Space

property checkpoints: list[str]

Returns a list of created checkpoints for this AutoRL environment.

Returns:

List of checkpoint paths.

Return type:

list[str]

property config: dict

Returns the AutoRL configuration.

Returns:

AutoRL configuration.

Return type:

dict

property config_space: ConfigurationSpace

Returns the hyperparameter configuration spaces as ConfigSpace.

Returns:

Hyperparameter configuration space.

Return type:

ConfigurationSpace

eval(num_eval_episodes)[source]

Evaluates the algorithm using its current training state.

Parameters:

num_eval_episodes (int) – Number of evaluation episodes to run.

Returns:

Array of evaluation return for each episode.

Return type:

np.ndarray

property hpo_config: Configuration

Returns the current hyperparameter configuration stored in the AutoRL environment..

Returns:

Hyperparameter configuration.

Return type:

Configuration

property objectives: list[str]

Returns configured objectives.

Returns:

List of objectives.

Return type:

list[str]

property observation_space: Space

Returns a gymnasium spaces of state features (observations).

Returns:

Gynasium space.

Return type:

gymnasium.spaces.Space

reset()[source]

Resets the AutoRL environment and current algorithm state.

Returns:

Empty observation and state information.

Return type:

tuple[ObservationT, InfoT]

step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]

Performs one iteration of RL training.

Parameters:
  • action (Configuration | dict) – Hyperparameter configuration to use for training.

  • n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.

  • n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.

  • n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.

  • seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.

Raises:

ValueError – Error is raised if step() is called before reset() was called.

Returns:

State information, objectives, terminated, truncated, additional information.

Return type:

tuple[ObservationT, ObjectivesT, bool, bool, InfoT]