Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder 1 Jean-Yves Audibert 2, 3, 4 Manfred Opper 5 John Shawe-Taylor 1
2 IMAGINE [Marne-la-Vallée]
CSTB - Centre Scientifique et Technique du Bâtiment, LIGM - Laboratoire d'Informatique Gaspard-Monge, ENPC - École des Ponts ParisTech
3 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function de ning the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.
Type de document :
Communication dans un congrès
AISTATS 2010 - Thirteenth International Conference on Artificial Intelligence and Statistics, May 2010, Chia Laguna Resort, Sardinia, Italy. 9, pp.273-280, 2010
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal-enpc.archives-ouvertes.fr/hal-00654517
Contributeur : Jean-Yves Audibert <>
Soumis le : jeudi 22 décembre 2011 - 10:06:06
Dernière modification le : jeudi 7 février 2019 - 15:49:20
Document(s) archivé(s) le : vendredi 23 mars 2012 - 02:21:32

Fichier

AISTATS10.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00654517, version 1

Citation

Steffen Grünewälder, Jean-Yves Audibert, Manfred Opper, John Shawe-Taylor. Regret Bounds for Gaussian Process Bandit Problems. AISTATS 2010 - Thirteenth International Conference on Artificial Intelligence and Statistics, May 2010, Chia Laguna Resort, Sardinia, Italy. 9, pp.273-280, 2010. 〈hal-00654517〉

Partager

Métriques

Consultations de la notice

1384

Téléchargements de fichiers

417