arlbench.autorl.autorl_env¶
Automated Reinforcement Learning Environment.
Classes
|
Automated Reinforcement Learning (gynmasium-like) Environment. |
- class arlbench.autorl.autorl_env.AutoRLEnv(config=None)[source]¶
Bases:
Env
Automated Reinforcement Learning (gynmasium-like) Environment.
With each reset, the algorithm state is (re-)initialized. If a checkpoint path is passed to reset, the agent state is initialized with the checkpointed state.
In each step, one iteration of training is performed with the current hyperparameter configuration (= action).
- property action_space: Space¶
Returns the hyperparameter configuration spaces as gymnasium space.
- Returns:
Hyperparameter configuration space.
- Return type:
gymnasium.spaces.Space
- property checkpoints: list[str]¶
Returns a list of created checkpoints for this AutoRL environment.
- Returns:
List of checkpoint paths.
- Return type:
list[str]
- property config: dict¶
Returns the AutoRL configuration.
- Returns:
AutoRL configuration.
- Return type:
dict
- property config_space: ConfigurationSpace¶
Returns the hyperparameter configuration spaces as ConfigSpace.
- Returns:
Hyperparameter configuration space.
- Return type:
ConfigurationSpace
- eval(num_eval_episodes)[source]¶
Evaluates the algorithm using its current training state.
- Parameters:
num_eval_episodes (int) – Number of evaluation episodes to run.
- Returns:
Array of evaluation return for each episode.
- Return type:
np.ndarray
- property hpo_config: Configuration¶
Returns the current hyperparameter configuration stored in the AutoRL environment..
- Returns:
Hyperparameter configuration.
- Return type:
Configuration
- property objectives: list[str]¶
Returns configured objectives.
- Returns:
List of objectives.
- Return type:
list[str]
- property observation_space: Space¶
Returns a gymnasium spaces of state features (observations).
- Returns:
Gynasium space.
- Return type:
gymnasium.spaces.Space
- reset()[source]¶
Resets the AutoRL environment and current algorithm state.
- Returns:
Empty observation and state information.
- Return type:
tuple[ObservationT, InfoT]
- step(action, checkpoint_path=None, n_total_timesteps=None, n_eval_steps=None, n_eval_episodes=None, seed=None)[source]¶
Performs one iteration of RL training.
- Parameters:
action (Configuration | dict) – Hyperparameter configuration to use for training.
n_total_timesteps (int | None, optional) – Number of total training steps. Defaults to None.
n_eval_steps (int | None, optional) – Number of evaluations during training. Defaults to None.
n_eval_episodes (int | None, optional) – Number of episodes to run per evalution during training. Defaults to None.
seed (int | None, optional) – Random seed. Defaults to None. If None, seed of the AutoRL environment is used.
- Raises:
ValueError – Error is raised if step() is called before reset() was called.
- Returns:
State information, objectives, terminated, truncated, additional information.
- Return type:
tuple[ObservationT, ObjectivesT, bool, bool, InfoT]