Document Zbl 1203.68134

Auer, Peter; Ortner, Ronald; Szepesvári, Csaba

Improved rates for the stochastic continuum-armed bandit problem. (English) Zbl 1203.68134

Bshouty, Nader H. (ed.) et al., Learning theory. 20th annual conference on learning theory, COLT 2007, San Diego, CA, USA, June 13–15, 2007. Proceedings. Berlin: Springer (ISBN 978-3-540-72925-9). Lecture Notes in Computer Science 4539. Lecture Notes in Artificial Intelligence, 454-468 (2007).

Summary: Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new assumptions new bounds on the expected regret are derived. In particular, we show that apart from logarithmic factors, the expected regret scales with the square-root of the number of trials, provided that the mean payoff function has finitely many maxima and its second derivatives are continuous and non-vanishing at the maxima. This improves a previous result of Cope by weakening the assumptions on the function. We also derive matching lower bounds. To complement the bounds on the expected regret, we provide high probability bounds which exhibit similar scaling.
For the entire collection see [Zbl 1121.68002].

Cited in 19 Documents

MSC:

68T05	Learning and adaptive systems in artificial intelligence
91A60	Probabilistic games; gambling
93E35	Stochastic learning and adaptive control

Cite Review PDF

Full Text: DOI