×

Consistency and robustness of kernel-based regression in convex risk minimization. (English) Zbl 1129.62031

Summary: We investigate statistical properties for a broad class of modern kernel-based regression (KBR) methods. These kernel methods were developed during the last decade and are inspired by convex risk minimization in infinite-dimensional Hilbert spaces. One leading example is support vector regression. We first describe the relationship between the loss function \(L\) of the KBR method and the tail of the response variable. We then establish the \(L\)-risk consistency for KBR which gives the mathematical justification for the statement that these methods are able to “learn”. Then we consider robustness properties of such kernel methods.
In particular, our results allow us to choose the loss function and the kernel to obtain computationally tractable and consistent KBR methods that have bounded influence functions. Furthermore, bounds for the bias and for the sensitivity curve, which is a finite sample version of the influence function, are developed, and the relationship between KBR and classical \(M\) estimators is discussed.

MSC:

62G08 Nonparametric regression and quantile regression
46N30 Applications of functional analysis in probability theory and statistics
65K99 Numerical methods for mathematical programming, optimization and variational techniques

References:

[1] Akerkar, R. (1999). Nonlinear Functional Analysis . New Dehli: Narosa Publishing House. · Zbl 0976.46033
[2] Brown, A. and Pearcy, C. (1977). Introduction to Operator Theory I . New York: Springer. · Zbl 0371.47001
[3] Cheney, W. (2001). Analysis for Applied Mathematics . New York: Springer. · Zbl 0984.46006
[4] Christmann, A. (2004). An approach to model complex high-dimensional insurance data. Allg. Statist. Archiv 88 375-396. · Zbl 1059.62108 · doi:10.1007/s101820400178
[5] Christmann, A. and Steinwart, I. (2004). On robust properties of convex risk minimization methods for pattern recognition. J. Mach. Learn. Res. 5 1007-1034. · Zbl 1222.68348
[6] Christmann, A. and Steinwart, I. (2006). Consistency of kernel-based quantile regression. · Zbl 1197.62034 · doi:10.1002/asmb.700
[7] Davies, P. (1993). Aspects of robust linear regression. Ann. Statist. 21 1843-1899. · Zbl 0797.62026 · doi:10.1214/aos/1176349401
[8] DeVito, E., Rosasco, L., Caponnetto, A., Piana, M. and Verri, A. (2004). Some properties of regularized kernel methods. J. Mach. Learn. Res. 5 1363-1390. · Zbl 1222.68181
[9] Diestel, J. and Uhl, J. (1977). Vector Measures . Providence: American Mathematical Society. · Zbl 0369.46039
[10] Diestel, J., Jarchow, H. and Tonge, A. (1995). Absolutely Summing Operators . Cambridge University Press. · Zbl 0855.47016
[11] Dudley, R. (2002). Real Analysis and Probability . Cambridge University Press. · Zbl 1023.60001
[12] Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression . New York: Springer. · Zbl 1021.62024
[13] Hampel, F. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383-393. · Zbl 0305.62031 · doi:10.2307/2285666
[14] Hampel, F., Ronchetti, E., Rousseeuw, P. and Stahel, W. (1986). Robust Statistics. The Approach Based on Influence Functions . New York: Wiley. · Zbl 0593.62027
[15] Hoffmann-Jørgensen, J. (1974). Sums of independent Banach space valued random variables. Studia Math. 52 159-186. · Zbl 0265.60005
[16] Huber, P. (1981). Robust Statistics . New York: Wiley. · Zbl 0536.62025
[17] Phelps, R. (1986). Convex Functions, Monotone Operators and Differentiability . Lecture Notes in Math. 1364 . Berlin: Springer. · Zbl 0921.46039
[18] Poggio, T., Rifkin, R., Mukherjee, S. and Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature 428 419-422.
[19] Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880. · Zbl 0547.62046 · doi:10.2307/2288718
[20] Schölkopf, B. and Smola, A. (2002). Learning with Kernels . MIT Press. · Zbl 1019.68094
[21] Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67-93. · Zbl 1009.68143 · doi:10.1162/153244302760185252
[22] Steinwart, I. (2004). Sparseness of support vector machines. J. Mach. Learn. Res. 4 1071-1105. · Zbl 1094.68082 · doi:10.1162/1532443041827925
[23] Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inform. Theory 51 128-142. · Zbl 1304.62090 · doi:10.1109/TIT.2004.839514
[24] Steinwart, I. (2006). How to compare different loss functions and their risks. Constr. Approx. · Zbl 1127.68089 · doi:10.1007/s00365-006-0662-3
[25] Steinwart, I., Hush, D. and Scovel, C. (2006). Function classes that approximate the Bayes risk. In Proceedings of the 19th Annual Conference on Learning Theory, COLT 2006 79-93. Lecture Notes in Comput. Sci. 4005 . Berlin: Springer. · Zbl 1143.68562 · doi:10.1007/11776420_9
[26] Suykens, J., Gestel, T.V., Brabanter, J.D., Moor, B.D. and Vandewalle, J. (2002). Least Squares Support Vector Machines . Singapore: World Scientific. · Zbl 1017.93004
[27] Tukey, J. (1977). Exploratory Data Analysis . Reading, MA: Addison-Wesley. · Zbl 0409.62003
[28] Vapnik, V. (1998). Statistical Learning Theory . New York: Wiley. · Zbl 0935.62007
[29] Wahba, G. (1990). Spline Models for Observational Data . Series in Applied Mathematics 59 . Philadelphia: SIAM. · Zbl 0813.62001
[30] Zhang, T. (2001). Convergence of large margin separable linear classification. In Advances in Neural Information Processing Systems 13 357-363.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.