CARL Classic Control Environments

Classic Control is a problem suite included in OpenAI’s gym consisting of simply physics simulation tasks. Context features here are therefore also physics-based, e.g. friction, mass or gravity.

CARL Pendulum Environment

Pendulum Environment

In Pendulum, the agent’s task is to swing up an inverted pendulum and balance it at the top from a random position. The action here is the direction and amount of force the agent wants to apply to the pendulum. Influence of context settings on an agent trained on the default environment:

Influence of context settings on an agent trained on the default environment.
Defaults and Bounds

Context Feature

Default

Bounds

max_speed

8.0

(-inf, inf, <class ‘float’>)

dt

0.05

(0, inf, <class ‘float’>)

g

10.0

(0, inf, <class ‘float’>)

m

1.0

(1e-06, inf, <class ‘float’>)

l

1.0

(1e-06, inf, <class ‘float’>)

CARL CartPole Environment

CartPole Environment

CartPole, similarly to Pendulum, asks the agent to balance a pole upright, though this time the agent doesn’t directly apply force to the pole but moves a cart on which the pole ist placed either to the left or the right. Influence of context settings on an agent trained on the default environment:

Influence of context settings on an agent trained on the default environment.
Defaults and Bounds

Context Feature

Default

Bounds

gravity

9.8

(0.1, inf, <class ‘float’>)

masscart

1.0

(0.1, 10, <class ‘float’>)

masspole

0.1

(0.01, 1, <class ‘float’>)

pole_length

0.5

(0.05, 5, <class ‘float’>)

force_magnifier

10.0

(1, 100, <class ‘int’>)

update_interval

0.02

(0.002, 0.2, <class ‘float’>)

CARL Acrobot Environment

Acrobot Environment

Acrobot is another swing-up task with the goal being swinging the end of the lower of two links up to a given height. The agent accomplishes this by actuating the joint connecting both links. Influence of context settings on an agent trained on the default environment:

Influence of context settings on an agent trained on the default environment.
Defaults and Bounds

Context Feature

Default

Bounds

link_length_1

1.0

(0.1, 10, <class ‘float’>)

link_length_2

1.0

(0.1, 10, <class ‘float’>)

link_mass_1

1.0

(0.1, 10, <class ‘float’>)

link_mass_2

1.0

(0.1, 10, <class ‘float’>)

link_com_1

0.5

(0, 1, <class ‘float’>)

link_com_2

0.5

(0, 1, <class ‘float’>)

link_moi

1.0

(0.1, 10, <class ‘float’>)

max_velocity_1

12.566370614359172

(1.2566370614359172, 125.66370614359172, <class ‘float’>)

max_velocity_2

28.274333882308138

(2.827433388230814, 282.7433388230814, <class ‘float’>)

torque_noise_max

0.0

(-1.0, 1.0, <class ‘float’>)

CARL MountainCar Environment

MountainCar Environment

The MountainCar environment asks the agent to move a car up a steep slope. In order to succeed, the agent has to accelerate using the opposite slope. There are two versions of the environment, a discrete one with only “left” and “right” as actions, as well as a continuous one. Influence of context settings on an agent trained on the default environment:

Influence of context settings on an agent trained on the default environment.

Defaults and bounds for the discrete MountainCar:

Defaults and Bounds

Context Feature

Default

Bounds

min_position

-1.2

(-inf, inf, <class ‘float’>)

max_position

0.6

(-inf, inf, <class ‘float’>)

max_speed

0.07

(0, inf, <class ‘float’>)

goal_position

0.5

(-inf, inf, <class ‘float’>)

goal_velocity

0.0

(-inf, inf, <class ‘float’>)

force

0.001

(-inf, inf, <class ‘float’>)

gravity

0.0025

(0, inf, <class ‘float’>)

start_position

-0.5

(-1.5, 0.5, <class ‘float’>)

start_position_std

0.1

(0.1, inf, <class ‘float’>)

start_velocity

0.0

(-inf, inf, <class ‘float’>)

start_velocity_std

0.0

(0.1, inf, <class ‘float’>)

And for the continuous case:

Defaults and Bounds

Context Feature

Default

Bounds

min_position

-1.2

(-inf, inf, <class ‘float’>)

max_position

0.6

(-inf, inf, <class ‘float’>)

max_speed

0.07

(0, inf, <class ‘float’>)

goal_position

0.45

(-inf, inf, <class ‘float’>)

goal_velocity

0.0

(-inf, inf, <class ‘float’>)

power

0.0015

(-inf, inf, <class ‘float’>)

min_position_start

-0.6

(-inf, inf, <class ‘float’>)

max_position_start

-0.4

(-inf, inf, <class ‘float’>)

min_velocity_start

0.0

(-inf, inf, <class ‘float’>)

max_velocity_start

0.0

(-inf, inf, <class ‘float’>)