Regret Bounds and Minimax Policies under Partial Monitoring

Jean-Yves Audibert; Sébastien Bubeck

Article Dans Une Revue Journal of Machine Learning Research Année : 2010

Regret Bounds and Minimax Policies under Partial Monitoring

(1, 2) , (3)

1
2
3

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

imagine [Marne-la-Vallée]

Models of visual object recognition and scene understanding

Sébastien Bubeck

Fonction : Auteur
PersonId : 844095

Sequential Learning

Résumé

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for ψ(x)=exp(η x) + γ/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with ψ(x)=(η/-x)q + γ/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.

Mots clés

Bandits (adversarial and stochastic) regret bound minimax rate label efficient upper confidence bound (UCB) policy online learning prediction with limited feedback

Domaines

Autres [stat.ML]

Fichier principal

JMLR10.pdf (325.61 Ko)

Origine : Accord explicite pour ce dépôt

Jean-Yves Audibert : Connectez-vous pour contacter le contributeur

https://enpc.hal.science/hal-00654356

Soumis le : mercredi 21 décembre 2011-17:08:02

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : jeudi 22 mars 2012-02:31:29

Dates et versions

hal-00654356 , version 1 (21-12-2011)

Identifiants

HAL Id : hal-00654356 , version 1

Citer

Jean-Yves Audibert, Sébastien Bubeck. Regret Bounds and Minimax Policies under Partial Monitoring. Journal of Machine Learning Research, 2010, 11, pp.2785-2836. ⟨hal-00654356⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC UNIV-LILLE3 CNRS INRIA UNIV-MLV LIGM_A3SI PARISTECH LAGIS LIGM IMAGINE INRIA2 PSL ESIEE-PARIS ANR UNIV-EIFFEL JSE2024

495 Consultations

332 Téléchargements

Regret Bounds and Minimax Policies under Partial Monitoring

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager