Decaying epsilon greedy
mighty.mighty_exploration.decaying_epsilon_greedy
#
Decaying Epsilon‐Greedy Exploration.
DecayingEpsilonGreedy
#
DecayingEpsilonGreedy(
algo,
model,
epsilon_start: float = 1.0,
epsilon_final: float = 0.01,
epsilon_decay_steps: int = 10000,
)
Bases: EpsilonGreedy
Epsilon-Greedy Exploration with linear decay schedule.
:param epsilon_start: Initial ε (at time step 0) :param epsilon_final: Final ε (after decay_steps) :param epsilon_decay_steps: Number of steps over which to linearly decay ε from epsilon_start → epsilon_final.
Source code in mighty/mighty_exploration/decaying_epsilon_greedy.py
__call__
#
Get action.
:param s: state :param return_logp: return logprobs :param metrics: current metric dict :param eval: eval mode :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
explore
#
Explore.
:param s: state :param return_logp: return logprobs :param _: not used :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
explore_func
#
Same as EpsilonGreedy, except uses decayed ε each time.
Source code in mighty/mighty_exploration/decaying_epsilon_greedy.py
get_random_actions
#
Override to recompute ε at each call, then delegate to EpsilonGreedy's logic.
Source code in mighty/mighty_exploration/decaying_epsilon_greedy.py
sample_func_logits
#
state_np: np.ndarray of shape [batch, obs_dim] Returns: (action_tensor, log_prob_tensor)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
sample_func_q
#
Q-learning branch
• state_np: np.ndarray of shape [batch, obs_dim] • model(state) returns Q-values: tensor [batch, n_actions]
We choose action = argmax(Q), and also return the full Q‐vector.