Functionality through Wrappers

In order to comfortably provide additional functionality to environments without changing the interface, we can use so-called wrappers. They execute environment resets and steps internally, but can either alter the environment behavior (e.g. by adding noise) or record information about the environment. To wrap an existing environment is simple:

from dacbench.wrappers import PerformanceTrackingWrapper

wrapped_env = PerformanceTrackingWrapper(env)

The provided environments for tracking performance, state and action information are designed to be used with DACBench’s logging functionality.

class dacbench.wrappers.ActionFrequencyWrapper(env, action_interval=None, logger=None)[source]

Bases: Wrapper

Wrapper to action frequency.

Includes interval mode that returns frequencies in lists of len(interval) instead of one long list.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

get_actions()[source]

Get state progression.

Returns:

np.array or np.array, np.array

all states or all states and interval sorted states

render_action_tracking()[source]

Render action progression.

Returns:

np.array

RBG data of action tracking

step(action)[source]

Execute environment step and record state.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, done, metainfo

  • float – state, reward, done, metainfo

  • bool – state, reward, done, metainfo

  • dict – state, reward, done, metainfo

class dacbench.wrappers.EpisodeTimeWrapper(env, time_interval=None, logger=None)[source]

Bases: Wrapper

Wrapper to track time spent per episode.

Includes interval mode that returns times in lists of len(interval) instead of one long list.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

get_times()[source]

Get times.

Returns:

np.array or np.array, np.array

all times or all times and interval sorted times

render_episode_time()[source]

Render episode times.

render_step_time()[source]

Render step times.

step(action)[source]

Execute environment step and record time.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, terminated, truncated, metainfo

  • float – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • dict – state, reward, terminated, truncated, metainfo

class dacbench.wrappers.InstanceSamplingWrapper(env, sampling_function=None, instances=None, reset_interval=0)[source]

Bases: Wrapper

Wrapper to sample a new instance at a given time point.

Instances can either be sampled using a given method or a distribution infered from a given list of instances.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

fit_dist(instances)[source]

Approximate instance distribution in given instance set.

Parameters:
  • instances (List) – instance set

  • Returns

  • -------

  • method – sampling method for new instances

reset(seed=None, options=None)[source]

Reset environment and use sampled instance for training.

Returns:

np.array

state

class dacbench.wrappers.MultiDiscreteActionWrapper(env)[source]

Bases: Wrapper

Wrapper to cast MultiDiscrete action spaces to Discrete. This should improve usability with standard RL libraries.

step(action)[source]

Maps discrete action value to array.

class dacbench.wrappers.ObservationWrapper(env)[source]

Bases: Wrapper

Wrapper convert observations spaces to spaces.Box for convenience.

Currently only supports Dict -> Box

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

flatten(state_dict)[source]

Flatten dict to list.

reset(seed=None, options=None)[source]

Execute environment step and record distance.

Returns:

np.array, dict

state, info

step(action)[source]

Execute environment step and record distance.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, terminated, truncated, metainfo

  • float – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • dict – state, reward, terminated, truncated, metainfo

class dacbench.wrappers.PerformanceTrackingWrapper(env, performance_interval=None, track_instance_performance=True, logger=None)[source]

Bases: Wrapper

Wrapper to track episode performance.

Includes interval mode that returns performance in lists of len(interval) instead of one long list.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

get_performance()[source]

Get state performance.

Returns:

np.array or np.array, np.array or np.array, dict or np.array, np.arry, dict

all states or all states and interval sorted states

render_instance_performance()[source]

Plot mean performance for each instance.

render_performance()[source]

Plot performance.

step(action)[source]

Execute environment step and record performance.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, done, metainfo

  • float – state, reward, done, metainfo

  • bool – state, reward, done, metainfo

  • dict – state, reward, done, metainfo

class dacbench.wrappers.PolicyProgressWrapper(env, compute_optimal)[source]

Bases: Wrapper

Wrapper to track progress towards optimal policy.

Can only be used if a way to obtain the optimal policy given an instance can be obtained.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

render_policy_progress()[source]

Plot progress.

step(action)[source]

Execute environment step and record distance.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, terminated, truncated, metainfo

  • float – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • dict – state, reward, terminated, truncated, metainfo

class dacbench.wrappers.RewardNoiseWrapper(env, noise_function=None, noise_dist='standard_normal', dist_args=None)[source]

Bases: Wrapper

Wrapper to add noise to the reward signal.

Noise can be sampled from a custom distribution or any distribution in numpy’s random module.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

add_noise(dist, args)[source]

Make noise function from distribution name and arguments.

Parameters:
  • dist (str) – Name of distribution

  • args (list) – List of distribution arguments

  • Returns

  • -------

  • function – Noise sampling function

step(action)[source]

Execute environment step and add noise.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, terminated, truncated, metainfo

  • float – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • bool – state, reward, terminated, truncated, metainfo

  • dict – state, reward, terminated, truncated, metainfo

class dacbench.wrappers.StateTrackingWrapper(env, state_interval=None, logger=None)[source]

Bases: Wrapper

Wrapper to track state changed over time.

Includes interval mode that returns states in lists of len(interval) instead of one long list.

__getattribute__(name)[source]

Get attribute value of wrapper if available and of env if not.

Parameters:
  • name (str) – Attribute to get

  • Returns

  • -------

  • value – Value of given name

__setattr__(name, value)[source]

Set attribute in wrapper if available and in env if not.

Parameters:
  • name (str) – Attribute to set

  • value – Value to set attribute to

get_states()[source]

Get state progression.

Returns:

np.array or np.array, np.array

all states or all states and interval sorted states

render_state_tracking()[source]

Render state progression.

Returns:

np.array

RBG data of state tracking

reset(seed=None, options=None)[source]

Reset environment and record starting state.

Returns:

np.array, {}

state, info

step(action)[source]

Execute environment step and record state.

Parameters:
  • action (int) – action to execute

  • Returns

  • -------

  • np.array – state, reward, done, metainfo

  • float – state, reward, done, metainfo

  • bool – state, reward, done, metainfo

  • dict – state, reward, done, metainfo