arlbench package¶
Subpackages¶
- arlbench.autorl package- Submodules
- arlbench.autorl.autorl_env module
- arlbench.autorl.checkpointing module
- arlbench.autorl.objectives module
- arlbench.autorl.state_features module
- Module contents
 
- arlbench.core package- Subpackages- arlbench.core.algorithms package
- arlbench.core.environments package- Submodules
- arlbench.core.environments.autorl_env module
- arlbench.core.environments.brax_env module
- arlbench.core.environments.envpool_env module
- arlbench.core.environments.gymnasium_env module
- arlbench.core.environments.gymnax_env module
- arlbench.core.environments.make_env module
- arlbench.core.environments.xland_env module
- Module contents
 
- arlbench.core.wrappers package
 
- Submodules
- arlbench.core.running_statistics module
- Module contents
 
- Subpackages
- arlbench.utils package
Submodules¶
arlbench.arlbench module¶
This module provides a function to run ARLBench using a given config.
Module contents¶
Top-level package for ARLBench.
- class arlbench.AutoRLEnv(config=None)[source]¶
- Bases: - Env- Automated Reinforcement Learning (gynmasium-like) Environment. - With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state. - In each step, one iteration of training is performed with the current hyperparameter configuration (= action). - ALGORITHMS = {'dqn': <class 'arlbench.core.algorithms.dqn.dqn.DQN'>, 'ppo': <class 'arlbench.core.algorithms.ppo.ppo.PPO'>, 'sac': <class 'arlbench.core.algorithms.sac.sac.SAC'>}¶
 - property action_space: Space¶
- Returns the hyperparameter configuration spaces as gymnasium space. - Returns:
- Hyperparameter configuration space. 
- Return type:
- gymnasium.spaces.Space 
 
 - property checkpoints: list[str]¶
- Returns a list of created checkpoints for this AutoRL environment. - Returns:
- List of checkpoint paths. 
- Return type:
- list[str] 
 
 - property config: dict¶
- Returns the AutoRL configuration. - Returns:
- AutoRL configuration. 
- Return type:
- dict 
 
 - property config_space: ConfigurationSpace¶
- Returns the hyperparameter configuration spaces as ConfigSpace. - Returns:
- Hyperparameter configuration space. 
- Return type:
- ConfigurationSpace 
 
 - eval(num_eval_episodes)[source]¶
- Evaluates the algorithm using its current training state. - Parameters:
- num_eval_episodes (int) – Number of evaluation episodes to run. 
- Returns:
- Array of evaluation return for each episode. 
- Return type:
- np.ndarray 
 
 - get_algorithm_init_kwargs(init_rng)[source]¶
- Returns the algorithm initialization parameters. - Returns:
- Dictionary of algorithm initialization parameters. 
- Return type:
- Dict 
 
 - property hpo_config: Configuration¶
- Returns the current hyperparameter configuration stored in the AutoRL environment.. - Returns:
- Hyperparameter configuration. 
- Return type:
- Configuration 
 
 - property objectives: list[str]¶
- Returns configured objectives. - Returns:
- List of objectives. 
- Return type:
- list[str] 
 
 - property observation_space: Space¶
- Returns a gymnasium spaces of state features (observations). - Returns:
- Gynasium space. 
- Return type:
- gymnasium.spaces.Space 
 
 - reset()[source]¶
- Resets the AutoRL environment and current algorithm state. - Returns:
- Empty observation and state information. 
- Return type:
- tuple[ObservationT, InfoT] 
 
 - step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶
- Performs one iteration of RL training. - Parameters:
- action (Configuration | dict) – Hyperparameter configuration to use for training. 
- n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None. 
- n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None. 
- n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None. 
- seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used. 
 
- Raises:
- ValueError – Error is raised if step() is called before reset() was called. 
- Returns:
- State information, objectives, terminated, truncated, additional information. 
- Return type:
- tuple[ObservationT, ObjectivesT, bool, bool, InfoT]