arlbench.autorl package¶
Submodules¶
arlbench.autorl.autorl_env module¶
Automated Reinforcement Learning Environment.
- class arlbench.autorl.autorl_env.AutoRLEnv(config=None)[source]¶
Bases:
Env
Automated Reinforcement Learning (gynmasium-like) Environment.
With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.
In each step, one iteration of training is performed with the current hyperparameter configuration (= action).
- ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}¶
- property action_space: Space¶
Returns the hyperparameter configuration spaces as gymnasium space.
- Returns:
Hyperparameter configuration space.
- Return type:
gymnasium.spaces.Space
- property checkpoints: list[str]¶
Returns a list of created checkpoints for this AutoRL environment.
- Returns:
List of checkpoint paths.
- Return type:
list[str]
- property config: dict¶
Returns the AutoRL configuration.
- Returns:
AutoRL configuration.
- Return type:
dict
- property config_space: ConfigurationSpace¶
Returns the hyperparameter configuration spaces as ConfigSpace.
- Returns:
Hyperparameter configuration space.
- Return type:
ConfigurationSpace
- eval(num_eval_episodes)[source]¶
Evaluates the algorithm using its current training state.
- Parameters:
num_eval_episodes (int) – Number of evaluation episodes to run.
- Returns:
Array of evaluation return for each episode.
- Return type:
np.ndarray
- property hpo_config: Configuration¶
Returns the current hyperparameter configuration stored in the AutoRL environment..
- Returns:
Hyperparameter configuration.
- Return type:
Configuration
- property objectives: list[str]¶
Returns configured objectives.
- Returns:
List of objectives.
- Return type:
list[str]
- property observation_space: Space¶
Returns a gymnasium spaces of state features (observations).
- Returns:
Gynasium space.
- Return type:
gymnasium.spaces.Space
- reset()[source]¶
Resets the AutoRL environment and current algorithm state.
- Returns:
Empty observation and state information.
- Return type:
tuple[ObservationT, InfoT]
- step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶
Performs one iteration of RL training.
- Parameters:
action (Configuration | dict) – Hyperparameter configuration to use for training.
n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.
n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.
n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.
seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.
- Raises:
ValueError – Error is raised if step() is called before reset() was called.
- Returns:
State information, objectives, terminated, truncated, additional information.
- Return type:
tuple[ObservationT, ObjectivesT, bool, bool, InfoT]
arlbench.autorl.checkpointing module¶
Contains all checkpointing-related methods for the AutoRL environment.
- class arlbench.autorl.checkpointing.Checkpointer[source]¶
Bases:
object
Contains all checkpointing-related methods for the AutoRL environment.
- MRP_FILE = 'max_recorded_priority.npy'¶
- NODES_FILE = 'nodes.npy'¶
- SCALARS_FILE = 'scalars.json'¶
- static load(checkpoint_path, algorithm_state)[source]¶
Loads a AutoRL environment checkpoint.
- Parameters:
checkpoint_path (str) – Path of the checkpoint.
algorithm_state (AlgorithmState) – Current algorithm state, certain attributes will be overriden by checkpoint.
- Returns:
Common AutoRL environment attributes as well as dictionary to restored algorithm state: (hp_config, c_step, c_episode), algorithm_kw_args
- Return type:
tuple[tuple[dict[str, Any], int, int], dict]
- static load_buffer(dummy_buffer_state, priority_state_path, buffer_dir, vault_uuid)[source]¶
Loads the buffer state from a checkpoint.
- Parameters:
dummy_buffer_state (PrioritisedTrajectoryBufferState) – Dummy buffer state. This is required to know the size/data types of the buffer.
priority_state_path (str) – Path where the priorities are stored.
buffer_dir (str) – The directory where the buffer data is stored.
vault_uuid (str) – The unique ID of the vault containing the buffer data.
- Returns:
The buffer state that was loaded from disk.
- Return type:
PrioritisedTrajectoryBufferState
- static save(algorithm, algorithm_state, autorl_config, hp_config, done, c_episode, c_step, train_result, tag=None)[source]¶
Saves the current state of a AutoRL environment.
- Parameters:
algorithm (str) – Name of the algorithm.
algorithm_state (AlgorithmState) – Current algorithm state.
autorl_config (dict) – AutoRL configuration.
hp_config (Configuration) – Hyperparameter configuration of the algorithm.
done (bool) – Whether the environment is done.
c_episode (int) – Current episode of the AutoRL environment.
c_step (int) – Current step of the AutoRL environment.
train_result (TrainResult | None) – Last training result of the algorithm.
tag (str | None, optional) – Checkpoint tag which is appended to the checkpoint name. Defaults to None.
- Returns:
Path of the checkpoint.
- Return type:
str
- static save_buffer(buffer_state, checkpoint_dir, checkpoint_name)[source]¶
Saves the buffer state of an algorithm.
- Parameters:
buffer_state (TrajectoryBufferState | PrioritisedTrajectoryBufferState) – Buffer state.
checkpoint_dir (str) – Checkpoint directory.
checkpoint_name (str) – Checkpoint name.
- Returns:
Dictionary containing the identifiers of single parts of the buffer. Required to load the checkpoint.
- Return type:
dict
arlbench.autorl.objectives module¶
This module contains the objectives for the AutoRL environment.
- class arlbench.autorl.objectives.Emissions(*args, **kwargs)[source]¶
Bases:
Objective
Emissions objective for the AutoRL environment. It measures the emissions during the training using code carbon.
-
KEY:
str
= 'emissions'¶
-
RANK:
int
= 1¶
- static __call__(train_func, objectives, optimize_objectives)[source]¶
Wraps the training function with the emissions calculation.
- Return type:
Callable
[[DQNRunnerState
|PPORunnerState
|SACRunnerState
,PrioritisedTrajectoryBufferState
,int
|None
,int
|None
,int
|None
],tuple
[DQNState
,DQNTrainingResult
] |tuple
[PPOState
,PPOTrainingResult
] |tuple
[SACState
,SACTrainingResult
]]
-
KEY:
- class arlbench.autorl.objectives.Objective(*args, **kwargs)[source]¶
Bases:
ABC
An abstract optimization objective for the AutoRL environment.
It can be wrapped around the training function to calculate the objective. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.
-
KEY:
str
¶
-
RANK:
int
¶
- abstract static __call__(train_func, objectives, optimize_objectives)[source]¶
Wraps the training function with the objective calculation.
- Parameters:
train_func (TrainFunc) – Training function to wrap.
objectives (dict) – Dictionary to store objective.
optimize_objectives (str) – Whether to minimize/maximize the objectve.
- Returns:
Training function.
- Return type:
TrainFunc
- __lt__(other)[source]¶
Implements “less-than” comparison between two objectives. Used for sorting based on objective rank.
- Parameters:
other (Objective) – Other Objective to compare to.
- Returns:
Whether this Objective is less than the other Objective.
- Return type:
bool
-
KEY:
- class arlbench.autorl.objectives.RewardMean(*args, **kwargs)[source]¶
Bases:
Objective
Reward objective for the AutoRL environment. It measures the mean of the last evaluation rewards.
-
KEY:
str
= 'reward_mean'¶
-
RANK:
int
= 2¶
- static __call__(train_func, objectives, optimize_objectives)[source]¶
Wraps the training function with the reward mean calculation.
- Return type:
Callable
[[DQNRunnerState
|PPORunnerState
|SACRunnerState
,PrioritisedTrajectoryBufferState
,int
|None
,int
|None
,int
|None
],tuple
[DQNState
,DQNTrainingResult
] |tuple
[PPOState
,PPOTrainingResult
] |tuple
[SACState
,SACTrainingResult
]]
-
KEY:
- class arlbench.autorl.objectives.RewardStd(*args, **kwargs)[source]¶
Bases:
Objective
Reward objective for the AutoRL environment. It measures the standard deviation of the last evaluation rewards.
-
KEY:
str
= 'reward_std'¶
-
RANK:
int
= 2¶
- static __call__(train_func, objectives, optimize_objectives)[source]¶
Wraps the training function with the reward standard deviation calculation.
- Return type:
Callable
[[DQNRunnerState
|PPORunnerState
|SACRunnerState
,PrioritisedTrajectoryBufferState
,int
|None
,int
|None
,int
|None
],tuple
[DQNState
,DQNTrainingResult
] |tuple
[PPOState
,PPOTrainingResult
] |tuple
[SACState
,SACTrainingResult
]]
-
KEY:
- class arlbench.autorl.objectives.Runtime(*args, **kwargs)[source]¶
Bases:
Objective
Runtime objective for the AutoRL environment. It measures the total training runtime.
-
KEY:
str
= 'runtime'¶
-
RANK:
int
= 0¶
- static __call__(train_func, objectives, optimize_objectives)[source]¶
Wraps the training function with the runtime calculation.
- Return type:
Callable
[[DQNRunnerState
|PPORunnerState
|SACRunnerState
,PrioritisedTrajectoryBufferState
,int
|None
,int
|None
,int
|None
],tuple
[DQNState
,DQNTrainingResult
] |tuple
[PPOState
,PPOTrainingResult
] |tuple
[SACState
,SACTrainingResult
]]
-
KEY:
arlbench.autorl.state_features module¶
State features for the AutoRL environment.
- class arlbench.autorl.state_features.GradInfo(*args, **kwargs)[source]¶
Bases:
StateFeature
Gradient information state feature for the AutoRL environment. It contains the grad norm during training.
-
KEY:
str
= 'grad_info'¶
- static __call__(train_func, state_features)[source]¶
Wraps the training function with the gradient information calculation.
- Return type:
Callable
[[DQNRunnerState
|PPORunnerState
|SACRunnerState
,PrioritisedTrajectoryBufferState
,int
|None
,int
|None
,int
|None
],tuple
[DQNState
,DQNTrainingResult
] |tuple
[PPOState
,PPOTrainingResult
] |tuple
[SACState
,SACTrainingResult
]]
-
KEY:
- class arlbench.autorl.state_features.StateFeature(*args, **kwargs)[source]¶
Bases:
ABC
An abstract state features for the AutoRL environment.
It can be wrapped around the training function to calculate the state features. We do this be overriding the __new__() function. It allows us to imitate the behaviour of a basic function while keeping the advantages of a static class.
-
KEY:
str
¶
- abstract static __call__(train_func, state_features)[source]¶
Wraps the training function with the state feature calculation.
- Parameters:
train_func (TrainFunc) – Training function to wrap.
state_features (dict) – Dictionary to store state features.
- Returns:
Wrapped training function.
- Return type:
TrainFunc
-
KEY:
Module contents¶
- class arlbench.autorl.AutoRLEnv(config=None)[source]¶
Bases:
Env
Automated Reinforcement Learning (gynmasium-like) Environment.
With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.
In each step, one iteration of training is performed with the current hyperparameter configuration (= action).
- ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}¶
- property action_space: Space¶
Returns the hyperparameter configuration spaces as gymnasium space.
- Returns:
Hyperparameter configuration space.
- Return type:
gymnasium.spaces.Space
- property checkpoints: list[str]¶
Returns a list of created checkpoints for this AutoRL environment.
- Returns:
List of checkpoint paths.
- Return type:
list[str]
- property config: dict¶
Returns the AutoRL configuration.
- Returns:
AutoRL configuration.
- Return type:
dict
- property config_space: ConfigurationSpace¶
Returns the hyperparameter configuration spaces as ConfigSpace.
- Returns:
Hyperparameter configuration space.
- Return type:
ConfigurationSpace
- eval(num_eval_episodes)[source]¶
Evaluates the algorithm using its current training state.
- Parameters:
num_eval_episodes (int) – Number of evaluation episodes to run.
- Returns:
Array of evaluation return for each episode.
- Return type:
np.ndarray
- property hpo_config: Configuration¶
Returns the current hyperparameter configuration stored in the AutoRL environment..
- Returns:
Hyperparameter configuration.
- Return type:
Configuration
- property objectives: list[str]¶
Returns configured objectives.
- Returns:
List of objectives.
- Return type:
list[str]
- property observation_space: Space¶
Returns a gymnasium spaces of state features (observations).
- Returns:
Gynasium space.
- Return type:
gymnasium.spaces.Space
- reset()[source]¶
Resets the AutoRL environment and current algorithm state.
- Returns:
Empty observation and state information.
- Return type:
tuple[ObservationT, InfoT]
- step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶
Performs one iteration of RL training.
- Parameters:
action (Configuration | dict) – Hyperparameter configuration to use for training.
n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.
n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.
n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.
seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.
- Raises:
ValueError – Error is raised if step() is called before reset() was called.
- Returns:
State information, objectives, terminated, truncated, additional information.
- Return type:
tuple[ObservationT, ObjectivesT, bool, bool, InfoT]