Mighty rollout buffer
mighty.mighty_replay.mighty_rollout_buffer
#
Mighty rollout buffer.
MaxiBatch
#
MightyRolloutBuffer
#
MightyRolloutBuffer(
buffer_size: int,
obs_shape,
act_dim,
device: str = "auto",
gae_lambda: float = 1,
gamma: float = 0.99,
n_envs: int = 1,
)
Bases: MightyBuffer
Rollout buffer used in on-policy algorithms like A2C/PPO. Stores transitions and computes returns and advantages.
:param buffer_size: Maximum number of transitions to store. :param obs_shape: Shape of the observation space. :param act_dim: Dimension of the action space. :param device: Device to store tensors on. :param gae_lambda: Lambda parameter for GAE. :param gamma: Discount factor. :param n_envs: Number of parallel environments.
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
__bool__
#
Return whether the buffer contains any transitions.
:return: True if buffer is not empty, False otherwise.
__len__
#
Return the total number of transitions in the buffer.
:return: Number of transitions.
add
#
add(rollout_batch: RolloutBatch, _)
Add a batch of transitions to the buffer.
:param rollout_batch: RolloutBatch containing transitions to add. :param _: Unused argument (for compatibility).
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
compute_returns_and_advantage
#
Compute returns and advantages using Generalized Advantage Estimation (GAE).
:param last_values: Value estimates for the last observation of each environment (shape: [n_envs]). :param dones: Done flags for the last step of each environment (shape: [n_envs]).
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
reset
#
Reset the buffer by clearing all stored transitions.
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
sample
#
sample(batch_size: int)
Sample mini-batches of transitions from the buffer.
:param batch_size: Number of transitions per mini-batch. :return: List of RolloutBatch samples.
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
save
#
Save the buffer to a file.
:param filename: Path to the file where the buffer will be saved.
RolloutBatch
#
RolloutBatch(
observations,
actions,
rewards,
advantages,
returns,
episode_starts,
log_probs,
values,
)
:param observations: Numpy array of observations. :param actions: Numpy array of actions. :param rewards: Numpy array of rewards. :param advantages: Numpy array of advantages. :param returns: Numpy array of returns. :param episode_starts: Numpy array indicating episode starts. :param log_probs: Numpy array of log probabilities. :param values: Numpy array of value estimates.
Source code in mighty/mighty_replay/mighty_rollout_buffer.py
__iter__
#
Iterate over the transitions in the batch.
:yield: Tuples of (observation, action, reward, advantage, return, episode_start, log_prob, value).