The bandits submodule¶

Contents:

class multibeep.bandits.arm_info¶

Bases: object

estimated_mean¶: estimated_mean – ‘float_t’

estimated_variance¶: estimated_variance – ‘float_t’

identifier¶: identifier – ‘unsigned int’

index¶: index – ‘unsigned int’

is_active¶: is_active – ‘bool’

name¶: name – ‘string’

num_pulls¶: num_pulls – ‘long double’

p_max¶: p_max – ‘float_t’

posterior¶: posterior – multibeep.util.posterior_class

real_mean¶: real_mean – ‘float_t’

real_variance¶: real_variance – ‘float_t’

rewards¶: rewards – ‘vector[float_t]’

class multibeep.bandits.base¶

Bases: object

Base class for all bandits.

It contains the functionality common to all subclasses. To access the arm_info objects of the contained arms, use the index operator ‘[]’

add_arm(self, base arm)¶

adds an arm to the bandit

Parameters:	arm (multibeep.arms.base) – an instantiated arm
Returns:	unsigned int – the unique identifier associated with the arm just added

deactivate_by_confidence_gap(self, float_t delta, bool consider_inactive_arms=True)¶

deactivates arms based on the posteriors by comparing the confidence bounds computed from the posteriors

Parameters:	delta (double) – determines the size of the used confidence bounds. They are computed as the delta/2, and 1-delta/2 quantiles respectively consider_inactive_arms (bool) – If True, the bounds of the inactive arms are also considered to find the largest lower bound.

deactivate_by_identifier(self, unsigned int ident)¶

deactivates an arm based on its unique identifier

Parameters:	ident (unsigned int) – the identifier that should be deactivated. In contrast to indices, the identifiers are constant over the lifetime of a bandit.

deactivate_by_index(self, unsigned int index)¶

deactivates an arm based on its current index

Parameters:	index (unsigned int) – the index that should be deactivated. Note the indices of other arms might change, so don’t use this in succession! See deactivate_by_identifier for deactivating multiple arms.

deactivate_n_worst(self, unsigned int n)¶

deactivates arms solely based on their estimated mean

Parameters:	n (unsigned int) – the number of arms to deactivate

min_pull_arms(self, unsigned int min_num_pulls)¶

ensures that each arm has been at least pulled a given number of times

Parameters:	min_num_pulls (unsigned int) – minimum number of pull required for every active arm

number_of_active_arms(self)¶

number_of_arms(self)¶

Returns:	unsigned int – total number of arms associated with the bandit

number_of_pulled_arms(self)¶

number_of_pulls(self)¶

pull_by_identifier(self, unsigned int ident)¶

use this function to pull an arm.

Parameters:	ident (unsigned int) – the identifier of the arm to pull

pull_by_index(self, unsigned int index)¶

use this function to pull an arm. Note the index of an arm might change when an arm is deactivated.

Parameters:	index (unsigned int) – the index of the arm to pull

reactivate_by_identifier(self, unsigned int ident)¶

reactivate_by_index(self, unsigned int index)¶

sort_active_arms_by_mean(self)¶: Sorts the remaining active arms by the estimated mean. The values are in descending order such that the ‘best’ arms has index zero.

update_p_max(self, bool consider_inactive=False, float_t delta=0.01, unsigned int GL_num_points=64)¶

Updates the p_max values for all arms, available in the arm_info objects

Parameters:	consider_inactive (bool) – whether or not to consider all arms during the computation delta (float) – controlls the confidence interval. See multibeep.util.posterior.base.support for more detail GL_num_points (unsigned int) – number of point used during the Gauss-Legendre integration.

class multibeep.bandits.empirical¶

Bases: multibeep.bandits.base

This bandit automatically provides a gaussian posterior based on the returned rewards. It does not provide a predictive posterior at the moment.

class multibeep.bandits.empirical_bandit¶: Bases: multibeep.bandits.base

class multibeep.bandits.last_n_pulls(unsigned int n=1)¶

Bases: multibeep.bandits.base

Similar to the empirical_bandit, but only the last few rults are used to compute the empirical posterior

Parameters:	n (unsigned int) – number of previous rewards used to compute the posterior

class multibeep.bandits.last_n_pulls_bandit¶: Bases: multibeep.bandits.base

class multibeep.bandits.posterior¶

Bases: multibeep.bandits.base

This bandit requires every arm added to provide a posterior!

class multibeep.bandits.posterior_bandit¶: Bases: multibeep.bandits.base

To Do’s¶

Integrate Joel’s code for p_max
write method to convert vector based on indices to identifier based representation
Bandit should know the best ‘true mean’ for easier regret computations (could change when adding/deactivating an arm)

The bandits submodule¶

To Do’s¶

Table Of Contents

Previous topic

Next topic

This Page