carl.envs.carl_env

Classes

CARLEnv(env[, n_envs, contexts, ...])

Meta-environment formulating the original environments as cMDPs.

Interfaces

class carl.envs.carl_env.CARLEnv(env, n_envs=1, contexts={}, hide_context=True, add_gaussian_noise_to_context=False, gaussian_noise_std_percentage=0.01, logger=None, max_episode_length=1000000, scale_context_features='no', default_context=None, state_context_features=None, context_mask=None, dict_observation_space=False, context_selector=None, context_selector_kwargs=None)[source]

Bases: Wrapper

Meta-environment formulating the original environments as cMDPs.

Here, a context feature can be anything defining the behavior of the environment. An instance is the environment with a specific context.

Can change the context after each episode.

If not all keys are present in the provided context(s) the contexts will be filled with the default context values in the init of the class.

Parameters:
  • env (gym.Env) – Environment which context features are made visible / which is turned into a cMDP.

  • contexts (Contexts) – Dict of contexts/instances. Key are context id, values are contexts as Dict[context feature id, context feature value].

  • hide_context (bool = False) – If False, the context will be appended to the original environment’s state.

  • add_gaussian_noise_to_context (bool = False) – Wether to add Gaussian noise to the context with the relative standard deviation ‘gaussian_noise_std_percentage’.

  • gaussian_noise_std_percentage (float = 0.01) – The relative standard deviation for the Gaussian noise. The actual standard deviation is calculated by ‘gaussian_noise_std_percentage’ * context feature value.

  • logger (TrialLogger, optional) – Optional TrialLogger which takes care of setting up logging directories and handles custom logging.

  • max_episode_length (int = 1e6) – Maximum length of episode in (time)steps. Cutoff.

  • scale_context_features (str = "no") – Wether to scale context features. Available modes are ‘no’, ‘by_mean’ and ‘by_default’. ‘by_mean’ scales the context features by their mean over all passed instances and ‘by_default’ scales the context features by their default values (‘default_context’).

  • default_context (Context) – The default context of the environment. Used for scaling the context features if applicable. Used for filling incomplete contexts.

  • state_context_features (Optional[List[str]] = None) – If the context is visible to the agent (hide_context=False), the context features are appended to the state. state_context_features specifies which of the context features are appended to the state. The default is appending all context features.

  • context_mask (Optional[List[str]]) – Name of context features to be ignored when appending context features to the state.

  • context_selector (Optional[Union[AbstractSelector, type(AbstractSelector)]]) – Context selector (object of) class, e.g., can be RoundRobinSelector (default) or RandomSelector. Should subclass AbstractSelector.

  • context_selector_kwargs (Optional[Dict]) – Optional kwargs for context selector class.

Raises:
  • ValueError – If the choice of instance_mode is not available.

  • ValueError – If the choice of scale_context_features is not available.

build_observation_space(env_lower_bounds=None, env_upper_bounds=None, context_bounds=None)[source]

Build observation space of environment.

If the hide_context = False, add correct bounds for the context features to the observation space.

Parameters:
  • env_lower_bounds (Optional[Union[List, np.array]], default=None) – Lower bounds for environment observation space. If env_lower_bounds and env_upper_bounds both are None, (re-)create bounds (low=-inf, high=inf) with correct dimension.

  • env_upper_bounds (Optional[Union[List, np.array]], default=None) – Upper bounds for environment observation space.

  • context_bounds (Optional[Dict[str, Tuple[float, float, float]]], default=None) – Lower and upper bounds for context features. The bounds are provided as a Dict containing the context feature names/ids as keys and the bounds per feature as a tuple (low, high, dtype). If None and the context should not be hidden, creates default bounds with (low=-inf, high=inf) with correct dimension.

Raises:

ValueError: – If (env.)observation space is not gym.spaces.Box and the context should not be hidden (hide_context = False).

Return type:

None

fill_context_with_default(context)[source]

Fill the context with the default values if entries are missing

Parameters:

context (Dict[str, Any]) –

Return type:

context

reset(seed=None, options=None, **kwargs)[source]

Reset environment.

Parameters:

kwargs (Dict) – Any keyword arguments passed to env.reset().

Return type:

Union[ObsType, tuple[ObsType, dict]]

Returns:

  • state – State of environment after reset.

  • info_dict (dict) – Return also if return_info=True.

step(action)[source]

Step the environment.

  1. Step

  2. Add (potentially scaled) context features to state if hide_context = False.

Emits done if the environment has taken more steps than cutoff (max_episode_length).

Parameters:

action (Any) – Action to pass to env.step.

Returns:

state, reward, done, info – Standard signature.

Return type:

Any, Any, bool, Dict