Ez greedy
mighty.mighty_exploration.ez_greedy
#
EZ-Greedy Exploration.
EZGreedy
#
Bases: EpsilonGreedy
Epsilon Greedy Exploration.
:param algo: algorithm name :param model: model :param epsilon: exploration epsilon :param zipf_param: parametrizes the Zipf distribution for skipping :return:
Source code in mighty/mighty_exploration/ez_greedy.py
__call__
#
Get action.
:param s: state :param return_logp: return logprobs :param metrics: current metric dict :param eval: eval mode :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
explore
#
Explore.
:param s: state :param return_logp: return logprobs :param _: not used :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
sample_func_logits
#
state_np: np.ndarray of shape [batch, obs_dim] Returns: (action_tensor, log_prob_tensor)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
sample_func_q
#
Q-learning branch
• state_np: np.ndarray of shape [batch, obs_dim] • model(state) returns Q-values: tensor [batch, n_actions]
We choose action = argmax(Q), and also return the full Q‐vector.