Regret Bounds for Gaussian Process Bandit Problems - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Regret Bounds for Gaussian Process Bandit Problems

(1) , (2, 3, 4) , (5) , (1)
1
2
3
4
5

Résumé

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function de ning the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.
Fichier principal
Vignette du fichier
AISTATS10.pdf (701.79 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00654517 , version 1 (22-12-2011)

Identifiants

  • HAL Id : hal-00654517 , version 1

Citer

Steffen Grünewälder, Jean-Yves Audibert, Manfred Opper, John Shawe-Taylor. Regret Bounds for Gaussian Process Bandit Problems. AISTATS 2010 - Thirteenth International Conference on Artificial Intelligence and Statistics, May 2010, Chia Laguna Resort, Sardinia, Italy. pp.273-280. ⟨hal-00654517⟩
839 Consultations
243 Téléchargements

Partager

Gmail Facebook Twitter LinkedIn More