Abstract
One of the main goals of machine learning is to study the generalization performance of learning algorithms. The previous main results describing the generalization ability of learning algorithms are usually based on independent and identically distributed (i.i.d.) samples. However, independence is a very restrictive concept for both theory and real-world applications. In this paper we go far beyond this classical framework by establishing the bounds on the rate of relative uniform convergence for the Empirical Risk Minimization (ERM) algorithm with uniformly ergodic Markov chain samples. We not only obtain generalization bounds of ERM algorithm, but also show that the ERM algorithm with uniformly ergodic Markov chain samples is consistent. The established theory underlies application of ERM type of learning algorithms.
Similar content being viewed by others
References
Azuma, K. Weighted sums of certain dependent random variables. Tohoku Math. J., 199: 357–367 (1967)
Bousquet, O. New approaches to statistical learning theory. Annals of the Institute of Statistical Machematics, 55: 371–389 (2003)
Chen, D.R., Wu, Q., Ying, Y.M., Zhou, D.X. Support vector machine soft margin clossifiers: error analysis. Journal of Machine Learning Research, 5: 1143–1175 (2004)
Cucker, F., Smale, S. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39: 1–49 (2001)
Cucker, F., Smale, S. Best choices for regularization parameters in learning theory: on the bias-variance problem. Found. Comput. Math., 2: 413–428 (2002)
Cucker, F., Zhou, D.X. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge, 2007
DeVore, R., Kerkyacharian, G., Picard, D., Temlyakov, V. Approximation methods for supervised learning. Foundations of Computational Mathematics, 6: 3–58 (2006)
Gamarnik, D. Extension of PAC framework to finite and countable markov chains. IEEE Trans. Inform. Theory, 49: 338–345 (2003)
Kontorovich, L., Ramanan, K. Concentration inequalities for dependent random variables via the martingale method. Ann. Probab., 36: 2126–2158 (2008)
Meyn, S.P., Tweedie, R.L. Markov chains and Stochastic Stability. Springer-Verlag, London, 1993
Modha, S., Masry, E. Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory, 42: 2133–2145 (1996)
Nobel, A., Dembo, A. A Note on uniform laws of averages for dependent processes. Statist. Probab. Lett., 17: 169–172 (1993)
Roberts, G.O., Rosenthal, J.S. General state space Markov chains and MCMC algorithms. Probability Surveys, 1: 20–71 (2004)
Smale, S., Zhou, D.X. Estimating the approximation error in learning theory. Anal. Appl., 1: 17–41 (2003)
Smale, S., Zhou, D.X. Shannon sampling and function reconstruction from point values. Bull. Amer. Math. Soc., 419: 279–305 (2004)
Smale, S., Zhou, D.X. Online learning with Markov sampling. Anal. Appl., 7: 87–113 (2009)
Steinwart, I., Hush, D., Scovel, C. Learning from dependent observations. Multivariate Anal., 100: 175–194 (2009)
Sun, H.W., Wu, Q. Regularized least square regression with dependnet samples. Adv. Comput. Math., 32: 175–189 (2010)
Vapnik, V. Statistical Learning Theory. John Wiley, New York, 1998
Vidyasagar, M. Learning and generalization with applications to neural networks. Springer, London, 2nd edition, 2003
Xu, Y.L., Chen, D.R. Learning rates of regularized regression for exponentially strongly mixing sequence. J. Statist. Plann., 138: 2180–2189 (2008)
Yu, B. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab., 22: 94–114 (1994)
Zou, B., Li, L.Q. The performance bounds of learning machines based on exponentially strongly mixing sequence. Computer and Mathematics with Applications, 53: 1050–1058 (2007)
Zou, B., Zhang, H., Xu, Z.B. Learning from uniformly ergodic Markov chain samples. Journal of Complexity, 25: 188–200 (2009)
Zou, B., Li, L.Q., Xu, Z.B. The generalization performance of ERM algorithm with strongly mixing observations. Machine Learning, 75: 275–295 (2009)
Zou, B., Xu, Z.B., Chang, X.Y. Generalization performance of ERM Algorithm with V-geometrically ergodic Markov Chain samples. Adv. Comput. Math., 36(1): 99–114 (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by National 973 project (No. 2013CB329404), Key Project of NSF of China (No. 11131006), National Natural Science Foundation of China (No. 61075054, 61370000).
Rights and permissions
About this article
Cite this article
Zou, B., Xu, Zb. & Xu, J. Generalization bounds of ERM algorithm with Markov chain samples. Acta Math. Appl. Sin. Engl. Ser. 30, 223–238 (2014). https://doi.org/10.1007/s10255-011-0096-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10255-011-0096-4
Keywords
- generalization bounds
- ERM algorithm
- relative uniform convergence
- uniformly ergodic Markov chain
- learning theory