arlbench.autorl package¶

Submodules¶

arlbench.autorl.autorl_env module¶

Automated Reinforcement Learning Environment.

class arlbench.autorl.autorl_env.AutoRLEnv(config=None)[source]¶

Bases: Env

Automated Reinforcement Learning (gynmasium-like) Environment.

With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.

In each step, one iteration of training is performed with the current hyperparameter configuration (= action).

ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}¶

property action_space: Space¶

Returns the hyperparameter configuration spaces as gymnasium space.

Returns:: Hyperparameter configuration space.
Return type:: gymnasium.spaces.Space

property checkpoints: list[str]¶

Returns a list of created checkpoints for this AutoRL environment.

Returns:: List of checkpoint paths.
Return type:: list[str]

property config: dict¶

Returns the AutoRL configuration.

Returns:: AutoRL configuration.
Return type:: dict

property config_space: ConfigurationSpace¶

Returns the hyperparameter configuration spaces as ConfigSpace.

Returns:: Hyperparameter configuration space.
Return type:: ConfigurationSpace

eval(num_eval_episodes)[source]¶

Evaluates the algorithm using its current training state.

Parameters:: num_eval_episodes (int) – Number of evaluation episodes to run.
Returns:: Array of evaluation return for each episode.
Return type:: np.ndarray

get_algorithm_init_kwargs(init_rng)[source]¶

Returns the algorithm initialization parameters.

Returns:: Dictionary of algorithm initialization parameters.
Return type:: Dict

property hpo_config: Configuration¶

Returns the current hyperparameter configuration stored in the AutoRL environment..

Returns:: Hyperparameter configuration.
Return type:: Configuration

property objectives: list[str]¶

Returns configured objectives.

Returns:: List of objectives.
Return type:: list[str]

property observation_space: Space¶

Returns a gymnasium spaces of state features (observations).

Returns:: Gynasium space.
Return type:: gymnasium.spaces.Space

reset()[source]¶

Resets the AutoRL environment and current algorithm state.

Returns:: Empty observation and state information.
Return type:: tuple[ObservationT, InfoT]

step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶

Performs one iteration of RL training.

Parameters:

action (Configuration | dict) – Hyperparameter configuration to use for training.
n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.
n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.
n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.
seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.

Raises:

ValueError – Error is raised if step() is called before reset() was called.

Returns:

State information, objectives, terminated, truncated, additional information.

Return type:

tuple[ObservationT, ObjectivesT, bool, bool, InfoT]

arlbench.autorl.checkpointing module¶

Contains all checkpointing-related methods for the AutoRL environment.

class arlbench.autorl.checkpointing.Checkpointer[source]¶

Bases: object

Contains all checkpointing-related methods for the AutoRL environment.

MRP_FILE = 'max_recorded_priority.npy'¶

NODES_FILE = 'nodes.npy'¶

SCALARS_FILE = 'scalars.json'¶

static load(checkpoint_path, algorithm_state)[source]¶

Loads a AutoRL environment checkpoint.

Parameters:

checkpoint_path (str) – Path of the checkpoint.
algorithm_state (AlgorithmState) – Current algorithm state, certain attributes will be overriden by checkpoint.

Returns:

Common AutoRL environment attributes as well as dictionary to restored algorithm state: (hp_config, c_step, c_episode), algorithm_kw_args

Return type:

tuple[tuple[dict[str, Any], int, int], dict]

static load_buffer(dummy_buffer_state, priority_state_path, buffer_dir, vault_uuid)[source]¶

Loads the buffer state from a checkpoint.

Parameters:

dummy_buffer_state (PrioritisedTrajectoryBufferState) – Dummy buffer state. This is required to know the size/data types of the buffer.
priority_state_path (str) – Path where the priorities are stored.
buffer_dir (str) – The directory where the buffer data is stored.
vault_uuid (str) – The unique ID of the vault containing the buffer data.

Returns:

The buffer state that was loaded from disk.

Return type:

PrioritisedTrajectoryBufferState

static save(algorithm, algorithm_state, autorl_config, hp_config, done, c_episode, c_step, train_result, tag=None)[source]¶

Saves the current state of a AutoRL environment.

Parameters:

algorithm (str) – Name of the algorithm.
algorithm_state (AlgorithmState) – Current algorithm state.
autorl_config (dict) – AutoRL configuration.
hp_config (Configuration) – Hyperparameter configuration of the algorithm.
done (bool) – Whether the environment is done.
c_episode (int) – Current episode of the AutoRL environment.
c_step (int) – Current step of the AutoRL environment.
train_result (TrainResult | None) – Last training result of the algorithm.
tag (str | None, optional) – Checkpoint tag which is appended to the checkpoint name. Defaults to None.

Returns:

Path of the checkpoint.

Return type:

str

static save_buffer(buffer_state, checkpoint_dir, checkpoint_name)[source]¶

Saves the buffer state of an algorithm.

Parameters:

buffer_state (TrajectoryBufferState | PrioritisedTrajectoryBufferState) – Buffer state.
checkpoint_dir (str) – Checkpoint directory.
checkpoint_name (str) – Checkpoint name.

Returns:

Dictionary containing the identifiers of single parts of the buffer. Required to load the checkpoint.

Return type:

dict

arlbench.autorl.objectives module¶

This module contains the objectives for the AutoRL environment.

class arlbench.autorl.objectives.Emissions(*args, **kwargs)[source]¶

Bases: Objective

Emissions objective for the AutoRL environment. It measures the emissions during the training using code carbon.

KEY: str = 'emissions'¶

RANK: int = 1¶

static __call__(train_func, objectives, optimize_objectives)[source]¶

Wraps the training function with the emissions calculation.

Return type:: Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]¶

Returns a dictionary containing the specification of the objective.

Return type:: dict

class arlbench.autorl.objectives.Objective(*args, **kwargs)[source]¶

Bases: ABC

An abstract optimization objective for the AutoRL environment.

It can be wrapped around the training function to calculate the objective. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.

KEY: str¶

RANK: int¶

abstract static __call__(train_func, objectives, optimize_objectives)[source]¶

Wraps the training function with the objective calculation.

Parameters:

train_func (TrainFunc) – Training function to wrap.
objectives (dict) – Dictionary to store objective.
optimize_objectives (str) – Whether to minimize/maximize the objectve.

Returns:

Training function.

Return type:

TrainFunc

__lt__(other)[source]¶

Implements “less-than” comparison between two objectives. Used for sorting based on objective rank.

Parameters:: other (Objective) – Other Objective to compare to.
Returns:: Whether this Objective is less than the other Objective.
Return type:: bool

static __new__(cls, *args, **kwargs)[source]¶

Creates a new instance of this objective and directly wraps the train function.

This is done by first creating an object and subsequently calling self.__call__().

Returns:: Wrapped training function.
Return type:: TrainFunc

abstract static get_spec()[source]¶

Returns a dictionary containing the specification of the objective.

Returns:: Specification.
Return type:: dict

class arlbench.autorl.objectives.RewardMean(*args, **kwargs)[source]¶

Bases: Objective

Reward objective for the AutoRL environment. It measures the mean of the last evaluation rewards.

KEY: str = 'reward_mean'¶

RANK: int = 2¶

static __call__(train_func, objectives, optimize_objectives)[source]¶

Wraps the training function with the reward mean calculation.

Return type:: Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]¶

Returns a dictionary containing the specification of the objective.

Return type:: dict

class arlbench.autorl.objectives.RewardStd(*args, **kwargs)[source]¶

Bases: Objective

Reward objective for the AutoRL environment. It measures the standard deviation of the last evaluation rewards.

KEY: str = 'reward_std'¶

RANK: int = 2¶

static __call__(train_func, objectives, optimize_objectives)[source]¶

Wraps the training function with the reward standard deviation calculation.

Return type:: Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]¶

Returns a dictionary containing the specification of the objective.

Return type:: dict

class arlbench.autorl.objectives.Runtime(*args, **kwargs)[source]¶

Bases: Objective

Runtime objective for the AutoRL environment. It measures the total training runtime.

KEY: str = 'runtime'¶

RANK: int = 0¶

static __call__(train_func, objectives, optimize_objectives)[source]¶

Wraps the training function with the runtime calculation.

Return type:: Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_spec()[source]¶

Returns a dictionary containing the specification of the objective.

Return type:: dict

arlbench.autorl.state_features module¶

State features for the AutoRL environment.

class arlbench.autorl.state_features.GradInfo(*args, **kwargs)[source]¶

Bases: StateFeature

Gradient information state feature for the AutoRL environment. It contains the grad norm during training.

KEY: str = 'grad_info'¶

static __call__(train_func, state_features)[source]¶

Wraps the training function with the gradient information calculation.

Return type:: Callable[[DQNRunnerState | PPORunnerState | SACRunnerState, PrioritisedTrajectoryBufferState, int | None, int | None, int | None], tuple[DQNState, DQNTrainingResult] | tuple[PPOState, PPOTrainingResult] | tuple[SACState, SACTrainingResult]]

static get_state_space()[source]¶

Returns state space.

Return type:: Space

class arlbench.autorl.state_features.StateFeature(*args, **kwargs)[source]¶

Bases: ABC

An abstract state features for the AutoRL environment.

It can be wrapped around the training function to calculate the state features. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.

KEY: str¶

abstract static __call__(train_func, state_features)[source]¶

Wraps the training function with the state feature calculation.

Parameters:

train_func (TrainFunc) – Training function to wrap.
state_features (dict) – Dictionary to store state features.

Returns:

Wrapped training function.

Return type:

TrainFunc

static __new__(cls, *args, **kwargs)[source]¶

Creates a new instance of this state feature and directly wraps the train function.

This is done by first creating an object and subsequently calling self.__call__().

Returns:: Wrapped training function.
Return type:: TrainFunc

abstract static get_state_space()[source]¶

Returns a dictionary containing the specification of the state feature.

Returns:: Specification.
Return type:: dict

Module contents¶

class arlbench.autorl.AutoRLEnv(config=None)[source]¶

Bases: Env

Automated Reinforcement Learning (gynmasium-like) Environment.

With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.

In each step, one iteration of training is performed with the current hyperparameter configuration (= action).

ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}¶

property action_space: Space¶

Returns the hyperparameter configuration spaces as gymnasium space.

Returns:: Hyperparameter configuration space.
Return type:: gymnasium.spaces.Space

property checkpoints: list[str]¶

Returns a list of created checkpoints for this AutoRL environment.

Returns:: List of checkpoint paths.
Return type:: list[str]

property config: dict¶

Returns the AutoRL configuration.

Returns:: AutoRL configuration.
Return type:: dict

property config_space: ConfigurationSpace¶

Returns the hyperparameter configuration spaces as ConfigSpace.

Returns:: Hyperparameter configuration space.
Return type:: ConfigurationSpace

eval(num_eval_episodes)[source]¶

Evaluates the algorithm using its current training state.

Parameters:: num_eval_episodes (int) – Number of evaluation episodes to run.
Returns:: Array of evaluation return for each episode.
Return type:: np.ndarray

get_algorithm_init_kwargs(init_rng)[source]¶

Returns the algorithm initialization parameters.

Returns:: Dictionary of algorithm initialization parameters.
Return type:: Dict

property hpo_config: Configuration¶

Returns the current hyperparameter configuration stored in the AutoRL environment..

Returns:: Hyperparameter configuration.
Return type:: Configuration

property objectives: list[str]¶

Returns configured objectives.

Returns:: List of objectives.
Return type:: list[str]

property observation_space: Space¶

Returns a gymnasium spaces of state features (observations).

Returns:: Gynasium space.
Return type:: gymnasium.spaces.Space

reset()[source]¶

Resets the AutoRL environment and current algorithm state.

Returns:: Empty observation and state information.
Return type:: tuple[ObservationT, InfoT]

step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶

Performs one iteration of RL training.

Parameters:

action (Configuration | dict) – Hyperparameter configuration to use for training.
n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.
n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.
n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.
seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.

Raises:

ValueError – Error is raised if step() is called before reset() was called.

Returns:

State information, objectives, terminated, truncated, additional information.

Return type:

tuple[ObservationT, ObjectivesT, bool, bool, InfoT]

ARLBench Documentation

arlbench.autorl package¶

Submodules¶

arlbench.autorl.autorl_env module¶

arlbench.autorl.checkpointing module¶

arlbench.autorl.objectives module¶

arlbench.autorl.state_features module¶

Module contents¶