The bandits submodule¶
Contents:
-
class
multibeep.bandits.
arm_info
¶ Bases:
object
-
estimated_mean
¶ estimated_mean – ‘float_t’
-
estimated_variance
¶ estimated_variance – ‘float_t’
-
identifier
¶ identifier – ‘unsigned int’
-
index
¶ index – ‘unsigned int’
-
is_active
¶ is_active – ‘bool’
-
name
¶ name – ‘string’
-
num_pulls
¶ num_pulls – ‘long double’
-
p_max
¶ p_max – ‘float_t’
-
posterior
¶ posterior – multibeep.util.posterior_class
-
real_mean
¶ real_mean – ‘float_t’
-
real_variance
¶ real_variance – ‘float_t’
-
rewards
¶ rewards – ‘vector[float_t]’
-
-
class
multibeep.bandits.
base
¶ Bases:
object
Base class for all bandits.
It contains the functionality common to all subclasses. To access the arm_info objects of the contained arms, use the index operator ‘[]’
-
add_arm
(self, base arm)¶ adds an arm to the bandit
Parameters: arm (multibeep.arms.base) – an instantiated arm Returns: unsigned int – the unique identifier associated with the arm just added
-
deactivate_by_confidence_gap
(self, float_t delta, bool consider_inactive_arms=True)¶ deactivates arms based on the posteriors by comparing the confidence bounds computed from the posteriors
Parameters: - delta (double) – determines the size of the used confidence bounds. They are computed as the delta/2, and 1-delta/2 quantiles respectively
- consider_inactive_arms (bool) – If True, the bounds of the inactive arms are also considered to find the largest lower bound.
-
deactivate_by_identifier
(self, unsigned int ident)¶ deactivates an arm based on its unique identifier
Parameters: ident (unsigned int) – the identifier that should be deactivated. In contrast to indices, the identifiers are constant over the lifetime of a bandit.
-
deactivate_by_index
(self, unsigned int index)¶ deactivates an arm based on its current index
Parameters: index (unsigned int) – the index that should be deactivated. Note the indices of other arms might change, so don’t use this in succession! See deactivate_by_identifier for deactivating multiple arms.
-
deactivate_n_worst
(self, unsigned int n)¶ deactivates arms solely based on their estimated mean
Parameters: n (unsigned int) – the number of arms to deactivate
-
min_pull_arms
(self, unsigned int min_num_pulls)¶ ensures that each arm has been at least pulled a given number of times
Parameters: min_num_pulls (unsigned int) – minimum number of pull required for every active arm
-
number_of_active_arms
(self)¶
-
number_of_arms
(self)¶ Returns: unsigned int – total number of arms associated with the bandit
-
number_of_pulled_arms
(self)¶
-
number_of_pulls
(self)¶
-
pull_by_identifier
(self, unsigned int ident)¶ use this function to pull an arm.
Parameters: ident (unsigned int) – the identifier of the arm to pull
-
pull_by_index
(self, unsigned int index)¶ use this function to pull an arm. Note the index of an arm might change when an arm is deactivated.
Parameters: index (unsigned int) – the index of the arm to pull
-
reactivate_by_identifier
(self, unsigned int ident)¶
-
reactivate_by_index
(self, unsigned int index)¶
-
sort_active_arms_by_mean
(self)¶ Sorts the remaining active arms by the estimated mean. The values are in descending order such that the ‘best’ arms has index zero.
-
update_p_max
(self, bool consider_inactive=False, float_t delta=0.01, unsigned int GL_num_points=64)¶ Updates the p_max values for all arms, available in the arm_info objects
Parameters: - consider_inactive (bool) – whether or not to consider all arms during the computation
- delta (float) – controlls the confidence interval. See multibeep.util.posterior.base.support for more detail
- GL_num_points (unsigned int) – number of point used during the Gauss-Legendre integration.
-
-
class
multibeep.bandits.
empirical
¶ Bases:
multibeep.bandits.base
This bandit automatically provides a gaussian posterior based on the returned rewards. It does not provide a predictive posterior at the moment.
-
class
multibeep.bandits.
empirical_bandit
¶ Bases:
multibeep.bandits.base
-
class
multibeep.bandits.
last_n_pulls
(unsigned int n=1)¶ Bases:
multibeep.bandits.base
Similar to the empirical_bandit, but only the last few rults are used to compute the empirical posterior
Parameters: n (unsigned int) – number of previous rewards used to compute the posterior
-
class
multibeep.bandits.
last_n_pulls_bandit
¶ Bases:
multibeep.bandits.base
-
class
multibeep.bandits.
posterior
¶ Bases:
multibeep.bandits.base
This bandit requires every arm added to provide a posterior!
-
class
multibeep.bandits.
posterior_bandit
¶ Bases:
multibeep.bandits.base
To Do’s¶
- Integrate Joel’s code for p_max
- write method to convert vector based on indices to identifier based representation
- Bandit should know the best ‘true mean’ for easier regret computations (could change when adding/deactivating an arm)