mdp_playground.envs.gym_env_wrapper.GymEnvWrapper¶

class mdp_playground.envs.gym_env_wrapper.GymEnvWrapper(env, **config)[source]¶

Bases: gym.core.Env

Wraps an OpenAI Gym environment to be able to modify its dimensions corresponding to MDP Playground. The documentation for the supported dimensions below can be found in mdp_playground/envs/rl_toy_env.py.

Currently supported dimensions:: transition noise (discrete) reward delay reward noise

Also supports wrapping with AtariPreprocessing from OpenAI Gym or wrap_deepmind from Ray Rllib.

__init__(env, **config)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(env, **config)	Initialize self.
`close`()	Override close in your subclass to perform any necessary cleanup.
`render`([mode])	Renders the environment.
`reset`()	Resets the state of the environment and returns an initial observation.
`seed`([seed])	Initialises the Numpy RNG for the environment by calling a utility for this in Gym.
`step`(action)	Run one timestep of the environment’s dynamics.

Attributes

`action_space`
`metadata`
`observation_space`
`reward_range`
`spec`
`unwrapped`	Completely unwrap this env.

close()¶

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(mode='human')¶

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note:

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Args:

mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

seed(seed=None)[source]¶

Initialises the Numpy RNG for the environment by calling a utility for this in Gym.

Parameters: seed (int) – seed to initialise the np_random instance held by the environment. Cannot use numpy.int64 or similar because Gym doesn’t accept it.
Returns: The seed returned by Gym
Return type: int

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

property unwrapped¶

Completely unwrap this env.

Returns:: gym.Env: The base non-wrapped gym.Env instance

MDP Playground 0.0.1 documentation

mdp_playground.envs.gym_env_wrapper.GymEnvWrapper¶