Q learning
mighty.mighty_update.q_learning
#
Q-learning update (modified to accept nested optimizer_kwargs and max_grad_norm).
ClippedDoubleQLearning
#
ClippedDoubleQLearning(
model,
gamma: float,
optimizer_class=Adam,
optimizer_kwargs: dict | None = None,
max_grad_norm: float | None = None,
)
Bases: QLearning
Clipped Double Q-learning update.
Source code in mighty/mighty_update/q_learning.py
apply_update
#
Apply the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
td_error
#
Compute the TD error for the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
DoubleQLearning
#
DoubleQLearning(
model,
gamma: float,
optimizer_class=Adam,
optimizer_kwargs: dict | None = None,
max_grad_norm: float | None = None,
)
Bases: QLearning
Double Q-learning update.
Source code in mighty/mighty_update/q_learning.py
apply_update
#
Apply the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
td_error
#
Compute the TD error for the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
QLearning
#
QLearning(
model,
gamma: float,
optimizer_class=Adam,
optimizer_kwargs: dict | None = None,
max_grad_norm: float | None = None,
)
Q-learning update.
:param model: The Q-network to optimize. :param gamma: Discount factor. :param optimizer_class: Optimizer class (e.g. torch.optim.Adam). :param optimizer_kwargs: Keyword args to pass into optimizer. :param max_grad_norm: If provided, gradient norms will be clipped to this value.
Source code in mighty/mighty_update/q_learning.py
apply_update
#
Apply the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
get_targets
#
Get targets for the Q-learning update.
Source code in mighty/mighty_update/q_learning.py
td_error
#
Compute the TD error for the Q-learning update.