×

Adaptive-treed bandits. (English) Zbl 1364.90269

Summary: We describe a novel algorithm for noisy global optimisation and continuum-armed bandits, with good convergence properties over any continuous reward function having finitely many polynomial maxima. Over such functions, our algorithm achieves square-root regret in bandits, and inverse-square-root error in optimisation, without prior information.{ }Our algorithm works by reducing these problems to tree-armed bandits, and we also provide new results in this setting. We show it is possible to adaptively combine multiple trees so as to minimise the regret, and also give near-matching lower bounds on the regret in terms of the zooming dimension.

MSC:

90C26 Nonconvex programming, global optimization
62G20 Asymptotic properties of nonparametric inference
62L05 Sequential statistical design
65C60 Computational problems in statistics (MSC2010)

References:

[1] Agrawal, R. (1995). The continuum-armed bandit problem. SIAM J. Control Optim. 33 1926-1951. · Zbl 0848.93069 · doi:10.1137/S0363012992237273
[2] Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235-256. · Zbl 1012.68093 · doi:10.1023/A:1013689704352
[3] Auer, P., Ortner, R. and Szepesvári, C. (2007). Improved rates for the stochastic continuum-armed bandit problem. In Learning Theory. Lecture Notes in Computer Science 4539 454-468. Berlin: Springer. · Zbl 1203.68134 · doi:10.1007/978-3-540-72927-3_33
[4] Bubeck, S. (2010). Jeux de bandits et fondations du clustering. Ph.D. thesis, Univ. Lille 1.
[5] Bubeck, S. and Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5 1-122. · Zbl 1281.91051 · doi:10.1561/2200000024
[6] Bubeck, S., Munos, R. and Stoltz, G. (2009). Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory. Lecture Notes in Computer Science 5809 23-37. Berlin: Springer. · Zbl 1262.68061 · doi:10.1007/978-3-642-04414-4_7
[7] Bubeck, S., Munos, R., Stoltz, G. and Szepesvári, C. (2011). \(\mathscr{X}\)-armed bandits. J. Mach. Learn. Res. 12 1655-1695. · Zbl 1280.91038
[8] Bubeck, S., Stoltz, G. and Yu, J. (2011). Lipschitz bandits without the Lipschitz constant. In Algorithmic Learning Theory 22 144-158. New York: Springer. · Zbl 1349.60069 · doi:10.1007/978-3-642-24412-4_14
[9] Bull, A.D. (2014). Supplement to “Adaptive-treed bandits.” .
[10] Cope, E.W. (2009). Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Automat. Control 54 1243-1253. · Zbl 1367.93738 · doi:10.1109/TAC.2009.2019797
[11] Frazier, P., Powell, W. and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21 599-613. · Zbl 1243.91014 · doi:10.1287/ijoc.1080.0314
[12] Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C. and Teytaud, O. (2012). The grand challenge of computer Go: Monte Carlo tree search and extensions. Comm. ACM 55 106-113.
[13] Huang, D., Allen, T.T., Notz, W.I. and Miller, R.A. (2006). Sequential kriging optimization using multiple-fidelity evaluations. Struct. Multidiscip. Optim. 32 369-382.
[14] Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23 462-466. · Zbl 0049.36601 · doi:10.1214/aoms/1177729392
[15] Kleinberg, R., Slivkins, A. and Upfal, E. (2008). Multi-armed bandits in metric spaces. In STOC’ 08 681-690. New York: ACM. · Zbl 1231.91048 · doi:10.1145/1374376.1374475
[16] Kleinberg, R.D. (2005). Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems 17 697-704. Cambridge, MA: MIT Press.
[17] Müller, H.-G. (1985). Kernel estimators of zeros and of location and size of extrema of regression functions. Scand. J. Stat. 12 221-232. · Zbl 0571.62032
[18] Munos, R. (2011). Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Advances in Neural Information Processing Systems 24 783-791. Cambridge, MA: MIT Press.
[19] Parsopoulos, K.E. and Vrahatis, M.N. (2002). Recent approaches to global optimization problems through particle swarm optimization. Nat. Comput. 1 235-306. · Zbl 1018.68063 · doi:10.1023/A:1016568309421
[20] Slivkins, A. (2011). Multi-armed bandits on implicit metric spaces. In Advances in Neural Information Processing Systems 24 1602-1610. Cambridge, MA: MIT Press.
[21] Srinivas, N., Krause, A., Kakade, S.M. and Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27 th International Conference on Machine Learning ( ICML- 10). · Zbl 1365.94131
[22] Valko, M., Carpentier, A. and Munos, R. (2013). Stochastic simultaneous optimistic optimization. In Proceedings of the 30 th International Conference on Machine Learning ( ICML- 13) 19-27.
[23] Yu, J.Y. and Mannor, S. (2011). Unimodal bandits. In Proceedings of the 28 th International Conference on Machine Learning ( ICML- 11).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.