Robustness of stochastic bandit policies

Antoine Salomon; Jean-Yves Audibert

doi:10.1016/j.tcs.2013.09.019

Article Dans Une Revue Theoretical Computer Science Année : 2014

Robustness of stochastic bandit policies

(1) , (1, 2)

1
2

Antoine Salomon

Fonction : Auteur
PersonId : 916255

Laboratoire d'Informatique Gaspard-Monge

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

Laboratoire d'Informatique Gaspard-Monge

Statistical Machine Learning and Parsimony

Résumé

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. [3]. This work first answers an open question: it extends this negative result to any anytime policy (i.e. any policy that does not take the number of plays n into account). Another contribution of this paper is to design robust anytime policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. We also show that, for any policy (i.e. even when the number of plays n is known), the regret is of order log n with probability at least 1-1/n, so that the policy of Audibert et al. has the best possible deviation properties.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Géométrie algorithmique [cs.CG] Traitement du texte et du document Cryptographie et sécurité [cs.CR] Génie logiciel [cs.SE] Langage de programmation [cs.PL]

Renaud Marlet : Connectez-vous pour contacter le contributeur

https://enpc.hal.science/hal-01801056

Soumis le : lundi 28 mai 2018-10:29:44

Dernière modification le : vendredi 19 avril 2024-16:18:57

Dates et versions

hal-01801056 , version 1 (28-05-2018)

Identifiants

HAL Id : hal-01801056 , version 1
DOI : 10.1016/j.tcs.2013.09.019

Citer

Antoine Salomon, Jean-Yves Audibert. Robustness of stochastic bandit policies. Theoretical Computer Science, 2014, 519, pp.46 - 67. ⟨10.1016/j.tcs.2013.09.019⟩. ⟨hal-01801056⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC CNRS INRIA LIGM_A3SI PARISTECH LIGM INRIA2 PSL ESIEE-PARIS UNIV-EIFFEL JSE2024

91 Consultations

0 Téléchargements

Robustness of stochastic bandit policies

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager