Mighty Inner Loops#
A key motivation for Mighty is to make it easy to create systems that interact with the RL loop. If these systems work during an algorithm's runtime, they are inner loop components. In Mighty, we call them Meta Components. This page documents their structure and why they're so useful.
What Are Meta Components?#
Meta components are elements interacting with the main loop at various points. Within this interaction, they have access to virtually all current internal information and can adapt it. This means everything from hyperparameter scheduling to learning a separate dynamics models and more is possible within this structure. Meta components can be stacked on top of one another to combine different approaches in a single run. This enables complex inner loop setups without entangling methods in code.
The Metrics Dictionary#
The most important part when adding components is the 'metrics' dictionary. This is Mighty's central information hub. Here you can find transitions, losses, predictions, batches and parameters - everything you need to build methods that actively work with the RL loop. If you want examples of how it is used, you can check out our RND implementation:
def get_reward(self, metrics):
"""Adapt LR on step.
:param metrics: Dict of current metrics
:return:
"""
if self.rnd_net is None:
self.initialize_networks(metrics["transition"]["next_state"].shape[1:])
rnd_error = self.rnd_net.get_error(metrics["transition"]["next_state"])
metrics["transition"]["intrinsic_reward"] = (
self.internal_reward_weight * rnd_error
)
metrics["transition"]["reward"] = (
metrics["transition"]["reward"] + self.internal_reward_weight * rnd_error
)
return metrics
Interactions With The Main Loop#
Meta-components are classes with methods that can be called at different points in the learning loop. There are several different call positions and they are specified by the component itself:
def __init__(self) -> None:
"""Meta module init.
:return:
"""
self.pre_step_methods = []
self.post_step_methods = []
self.pre_update_methods = []
self.post_update_methods = []
self.pre_episode_methods = []
self.post_episode_methods = []
- task scheduling/curriculum learning
- hyperparameter scheduling
- intrinsic rewards
Meta-components are also stackable, i.e. you can run multiple ones per training run. In principle, you can do almost anything in a meta-component, including training additional networks or calling the policy directly. Before you default to using this class, however, we recommend double checking if your idea isn't better suited to a more specific class.
Combining Components#
When combining different modules, they are stacked on top of one another. This means they are executed in order for each method. For meta components interacting with each other or the same parts of the base loop, this order can be important! If you, for example, use a curriculum based on training reward and intrinsic reward, you should likely configure the curriculum to be called first to avoid basing the difficulty on the reward bonus.