.. multibeep documentation master file, created by
   sphinx-quickstart on Fri Sep  2 14:22:32 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

The policies submodule
=====================================

Contents:

.. automodule:: multibeep.policies
	:members:
	:undoc-members:
	:show-inheritance:


TODO
=========

Policies that need to be coded:

1.	UCB1, UCB1-NORMAL
		P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem.
		Machine Learning, 47(2-3):235–256, 2002
2.	lil-UCB
		Jamieson et al.
		lil’ UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
		JMLR: Workshop and Conference Proceedings vol 35:1–17, 2014
3.	KL-UCB
		Aurélien Garivier, Olivier Cappé: The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT 2011: 359-376
4.	MOSS
		J-Y. Audibert and S. Bubeck: Minimax Policies for Adversarial and Stochastic Bandits.  Proceedings of the 22nd Annual Conference on Learning Theory 2009
5.	Epsilon-Greedy
		http://cs.mcgill.ca/~vkules/bandits.pdf
6.	SoftMax/Boltzmann Exploration
		http://cs.mcgill.ca/~vkules/bandits.pdf
7.	Poker:
		Vermorel, Joannes, and Mehryar Mohri.
		"Multi-armed bandit algorithms and empirical evaluation."
		European conference on machine learning. Springer Berlin Heidelberg, 2005.
8.	f-race
		Birattari, Mauro, et al.
		"F-Race and iterated F-Race: An overview."
		Experimental methods for the analysis of optimization algorithms. Springer Berlin Heidelberg, 2010. 311-336.
9.	Entropy search with different update rules
		a.	proper predictive posterior to compute expected change in H(pmax)
		
		c.	Sampling from the reward history to approximate expected change in H(pmax)
		
		b.	Joel's sampling from the posterior and pretending the mean is fixed (expected 'information of one mean')