The bandits submodule

Contents:

class multibeep.bandits.arm_info

Bases: object

estimated_mean

estimated_mean – ‘float_t’

estimated_variance

estimated_variance – ‘float_t’

identifier

identifier – ‘unsigned int’

index

index – ‘unsigned int’

is_active

is_active – ‘bool’

name

name – ‘string’

num_pulls

num_pulls – ‘long double’

p_max

p_max – ‘float_t’

posterior

posterior – multibeep.util.posterior_class

real_mean

real_mean – ‘float_t’

real_variance

real_variance – ‘float_t’

rewards

rewards – ‘vector[float_t]’

class multibeep.bandits.base

Bases: object

Base class for all bandits.

It contains the functionality common to all subclasses. To access the arm_info objects of the contained arms, use the index operator ‘[]’

add_arm(self, base arm)

adds an arm to the bandit

Parameters:arm (multibeep.arms.base) – an instantiated arm
Returns:unsigned int – the unique identifier associated with the arm just added
deactivate_by_confidence_gap(self, float_t delta, bool consider_inactive_arms=True)

deactivates arms based on the posteriors by comparing the confidence bounds computed from the posteriors

Parameters:
  • delta (double) – determines the size of the used confidence bounds. They are computed as the delta/2, and 1-delta/2 quantiles respectively
  • consider_inactive_arms (bool) – If True, the bounds of the inactive arms are also considered to find the largest lower bound.
deactivate_by_identifier(self, unsigned int ident)

deactivates an arm based on its unique identifier

Parameters:ident (unsigned int) – the identifier that should be deactivated. In contrast to indices, the identifiers are constant over the lifetime of a bandit.
deactivate_by_index(self, unsigned int index)

deactivates an arm based on its current index

Parameters:index (unsigned int) – the index that should be deactivated. Note the indices of other arms might change, so don’t use this in succession! See deactivate_by_identifier for deactivating multiple arms.
deactivate_n_worst(self, unsigned int n)

deactivates arms solely based on their estimated mean

Parameters:n (unsigned int) – the number of arms to deactivate
min_pull_arms(self, unsigned int min_num_pulls)

ensures that each arm has been at least pulled a given number of times

Parameters:min_num_pulls (unsigned int) – minimum number of pull required for every active arm
number_of_active_arms(self)
number_of_arms(self)
Returns:unsigned int – total number of arms associated with the bandit
number_of_pulled_arms(self)
number_of_pulls(self)
pull_by_identifier(self, unsigned int ident)

use this function to pull an arm.

Parameters:ident (unsigned int) – the identifier of the arm to pull
pull_by_index(self, unsigned int index)

use this function to pull an arm. Note the index of an arm might change when an arm is deactivated.

Parameters:index (unsigned int) – the index of the arm to pull
reactivate_by_identifier(self, unsigned int ident)
reactivate_by_index(self, unsigned int index)
sort_active_arms_by_mean(self)

Sorts the remaining active arms by the estimated mean. The values are in descending order such that the ‘best’ arms has index zero.

update_p_max(self, bool consider_inactive=False, float_t delta=0.01, unsigned int GL_num_points=64)

Updates the p_max values for all arms, available in the arm_info objects

Parameters:
  • consider_inactive (bool) – whether or not to consider all arms during the computation
  • delta (float) – controlls the confidence interval. See multibeep.util.posterior.base.support for more detail
  • GL_num_points (unsigned int) – number of point used during the Gauss-Legendre integration.
class multibeep.bandits.empirical

Bases: multibeep.bandits.base

This bandit automatically provides a gaussian posterior based on the returned rewards. It does not provide a predictive posterior at the moment.

class multibeep.bandits.empirical_bandit

Bases: multibeep.bandits.base

class multibeep.bandits.last_n_pulls(unsigned int n=1)

Bases: multibeep.bandits.base

Similar to the empirical_bandit, but only the last few rults are used to compute the empirical posterior

Parameters:n (unsigned int) – number of previous rewards used to compute the posterior
class multibeep.bandits.last_n_pulls_bandit

Bases: multibeep.bandits.base

class multibeep.bandits.posterior

Bases: multibeep.bandits.base

This bandit requires every arm added to provide a posterior!

class multibeep.bandits.posterior_bandit

Bases: multibeep.bandits.base

To Do’s

  1. Integrate Joel’s code for p_max
  2. write method to convert vector based on indices to identifier based representation
  3. Bandit should know the best ‘true mean’ for easier regret computations (could change when adding/deactivating an arm)