Options for ARLBench¶
A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in. The high level configuration takes place via the ‘autorl’ keys in the configuration file. These are the available options:
seed: The seed for the random number generator
env_framework: Environment framework to use. Currently supported: gymnax, envpool, brax, xland
env_name: The name of the environment to use
env_kwargs: Additional keyword arguments for the environment
eval_env_kwargs: Additional keyword arguments for the evaluation environment
n_envs: Number of environments to use in parallel
algorithm: The algorithm to use. Currently supported: dqn, ppo, sac
cnn_policy: Whether to use a CNN policy
deterministic_eval: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation.
nas_config: Configuration for architecture
checkpoint: A list of elements the checkpoint should contain
checkpoint_name: The name of the checkpoint
checkpoint_dir: The directory to save the checkpoint in
objectives: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions
optimize_objectives: Whether to maximize or minimize the objectives
state_features: The features of the RL algorithm’s state to return
n_steps: The number of steps in the configuration schedule. Using 1 will result in a static configuration
n_total_timesteps: The total number of timesteps to train in each schedule interval
n_eval_steps: The number of steps to evaluate the agent for
n_eval_episodes: The number of episodes to evaluate the agent for
The low level configuration options can be found in the ‘hp_config’ key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information.