Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Jean-Yves Audibert; Remi Munos; Csaba Szepesvari

doi:10.1016/j.tcs.2009.01.016

Article Dans Une Revue Theoretical Computer Science Année : 2009

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

(1, 2) , (3) , (4)

1
2
3
4

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

imagine [Marne-la-Vallée]

Models of visual object recognition and scene understanding

Remi Munos

Fonction : Auteur

Sequential Learning

Csaba Szepesvari

Fonction : Auteur

University of Alberta

Résumé

Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, Such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use Such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be Suitable for a risk-averse decision maker. We illustrate some of the results by Computer simulations. (C) 2009 Elsevier B.V. All rights reserved.

Domaines

Autre [cs.OH]

Ist Enpc : Connectez-vous pour contacter le contributeur

https://enpc.hal.science/hal-00711069

Soumis le : vendredi 22 juin 2012-12:04:20

Dernière modification le : jeudi 28 mars 2024-03:25:15

Dates et versions

hal-00711069 , version 1 (22-06-2012)

Identifiants

HAL Id : hal-00711069 , version 1
DOI : 10.1016/j.tcs.2009.01.016

Citer

Jean-Yves Audibert, Remi Munos, Csaba Szepesvari. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 2009, 410 (19), pp.1876--1902. ⟨10.1016/j.tcs.2009.01.016⟩. ⟨hal-00711069⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC UNIV-LILLE3 CNRS INRIA PARISTECH LAGIS LIGM IMAGINE INRIA2 PSL UNIV-EIFFEL JSE2024

571 Consultations

0 Téléchargements

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager