Minimax policies for adversarial and stochastic bandits - École des Ponts ParisTech Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Minimax policies for adversarial and stochastic bandits

Résumé

We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem. Concretely, we remove an extraneous loga- rithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate log- arithmic in the number of plays.
Fichier principal
Vignette du fichier
COLT09a.pdf (179.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00834882 , version 1 (17-06-2013)

Identifiants

  • HAL Id : hal-00834882 , version 1

Citer

Jean-Yves Audibert, Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. COLT, Jun 2009, Montreal, Canada. pp.217-226. ⟨hal-00834882⟩
484 Consultations
887 Téléchargements

Partager

Gmail Facebook X LinkedIn More