Epsilon greedy
mighty.mighty_exploration.epsilon_greedy
#
Epsilon Greedy Exploration.
EpsilonGreedy
#
Bases: MightyExplorationPolicy
Epsilon Greedy Exploration.
:param algo: algorithm name :param func: policy function :param epsilon: exploration epsilon :param env: environment :return:
Source code in mighty/mighty_exploration/epsilon_greedy.py
__call__
#
Get action.
:param s: state :param return_logp: return logprobs :param metrics: current metric dict :param eval: eval mode :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
explore
#
Explore.
:param s: state :param return_logp: return logprobs :param _: not used :return: action or (action, logprobs)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
sample_func_logits
#
state_np: np.ndarray of shape [batch, obs_dim] Returns: (action_tensor, log_prob_tensor)
Source code in mighty/mighty_exploration/mighty_exploration_policy.py
sample_func_q
#
Q-learning branch
• state_np: np.ndarray of shape [batch, obs_dim] • model(state) returns Q-values: tensor [batch, n_actions]
We choose action = argmax(Q), and also return the full Q‐vector.