mdp_playground.envs.gym_env_wrapper.GymEnvWrapper¶
-
class
mdp_playground.envs.gym_env_wrapper.
GymEnvWrapper
(env, **config)[source]¶ Bases:
gym.core.Env
Wraps an OpenAI Gym environment to be able to modify its dimensions corresponding to MDP Playground. The documentation for the supported dimensions below can be found in mdp_playground/envs/rl_toy_env.py.
- Currently supported dimensions:
transition noise (discrete) reward delay reward noise
Also supports wrapping with AtariPreprocessing from OpenAI Gym or wrap_deepmind from Ray Rllib.
Methods
__init__
(env, **config)Initialize self.
close
()Override close in your subclass to perform any necessary cleanup.
render
([mode])Renders the environment.
reset
()Resets the state of the environment and returns an initial observation.
seed
([seed])Initialises the Numpy RNG for the environment by calling a utility for this in Gym.
step
(action)Run one timestep of the environment’s dynamics.
Attributes
action_space
metadata
observation_space
reward_range
spec
Completely unwrap this env.
-
close
()¶ Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
render
(mode='human')¶ Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
- Note:
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Args:
mode (str): the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
-
reset
()[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
observation (object): the initial observation.
-
seed
(seed=None)[source]¶ Initialises the Numpy RNG for the environment by calling a utility for this in Gym.
- Parameters
seed (int) – seed to initialise the np_random instance held by the environment. Cannot use numpy.int64 or similar because Gym doesn’t accept it.
- Returns
The seed returned by Gym
- Return type
int
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
property
unwrapped
¶ Completely unwrap this env.
- Returns:
gym.Env: The base non-wrapped gym.Env instance