×

Robust mixture regression using the \(t\)-distribution. (English) Zbl 1471.62227

Summary: The traditional estimation of mixture regression models is based on the normal assumption of component errors and thus is sensitive to outliers or heavy-tailed errors. A robust mixture regression model based on the \(t\)-distribution by extending the mixture of \(t\)-distributions to the regression setting is proposed. However, this proposed new mixture regression model is still not robust to high leverage outliers. In order to overcome this, a modified version of the proposed method, which fits the mixture regression based on the \(t\)-distribution to the data after adaptively trimming high leverage points, is also proposed. Furthermore, it is proposed to adaptively choose the degrees of freedom for the \(t\)-distribution using profile likelihood. The proposed robust mixture regression estimate has high efficiency due to the adaptive choice of degrees of freedom.

MSC:

62-08 Computational methods for problems pertaining to statistics
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
62F10 Point estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

robustbase
Full Text: DOI

References:

[1] Bai, X.; Yao, W.; Boyer, J. E., Robust Fitting of mixture regression models, Computational Statistics and Data Analysis, 56, 2347-2359, (2012) · Zbl 1252.62011
[2] Bashir, S.; Carter, E., Robust mixture of linear regression models, Communications in Statistics—Theory and Methods, 41, 3371-3388, (2012) · Zbl 1296.62111
[3] Böhning, D., Computer-assisted analysis of mixtures and applications, (1999), Chapman and Hall/CRC Boca Raton, FL
[4] Celeux, G.; Hurn, M.; Robert, C. P., Computational and inferential difficulties with mixture posterior distributions, Journal of the American Statistical Association, 95, 957-970, (2000) · Zbl 0999.62020
[5] Chen, J.; Tan, X.; Zhang, R., Inference for normal mixture in mean and variance, Statistica Sinica, 18, 443-465, (2008) · Zbl 1135.62018
[6] Coakley, C. W.; Hettmansperger, T. P., A bounded influence, high breakdown, efficient regression estimator, Journal of the American Statistical Association, 88, 872-880, (1993) · Zbl 0783.62024
[7] Cohen, E., Some effects of inharmonic partials on interval perception, Music Perception, 1, 323-349, (1984)
[8] Davies, L., Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices, Annals of Statistics, 15, 1269-1292, (1987) · Zbl 0645.62057
[9] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of Royal Statistical Society, Series B, 39, 1-38, (1977) · Zbl 0364.62022
[10] Donoho, D.L., 1982. Breakdown properties of multivariate location estimators. Qualifying Paper, Harvard University, Boston.
[11] Donoho, D. L.; Gasko, M., Breakdown properties of location estimates based on halfspace depth and projected outlyingness, Annals of Statistics, 20, 1803-1827, (1992) · Zbl 0776.62031
[12] Frühwirth-Schnatter, S., Finite mixture and Markov switching models, (2006), Springer · Zbl 1108.62002
[13] García-Escudero, L. A.; Gordaliza, A.; Mayo-Iscara, A.; San Martín, R., Robust clusterwise linear regression through trimming, Computational Statistics & Data Analysis, 54, 3057-3069, (2010) · Zbl 1284.62198
[14] García-Escudero, L. A.; Gordaliza, A.; San Martín, R.; Van Aelst, S.; Zamar, R., Robust linear clustering, Journal of the Royal Statistical Society, Series B, 71, 301-318, (2009) · Zbl 1231.62112
[15] Goldfeld, S. M.; Quandt, R. E., A Markov model for switching regression, Journal of Econometrics, 1, 3-15, (1973) · Zbl 0294.62087
[16] Hathaway, R. J., A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Annals of Statistics, 13, 795-800, (1985) · Zbl 0576.62039
[17] Hathaway, R. J., A constrained EM algorithm for univariate mixtures, Journal of Statistical Computation and Simulation, 23, 211-230, (1986)
[18] Hennig, C., Identifiability of models for clusterwise linear regression, Journal of Classification, 17, 273-296, (2000) · Zbl 1017.62058
[19] Hennig, C., Fixed point clusters for linear regression: computation and comparison, Journal of Classification, 19, 249-276, (2002) · Zbl 1017.62057
[20] Hennig, C., Clusters, outliers, and regression: fixed point clusters, Journal of Multivariate Analysis, 86, 183-212, (2003) · Zbl 1020.62051
[21] Hennig, C., Breakdown points for maximum likelihood-estimators of location-scale mixtures, Annals of Statistics, 32, 1313-1340, (2004) · Zbl 1047.62063
[22] Huang, M.; Yao, W., Mixture of regression models with varying mixing proportions: a semiparametric approach, Journal of the American Statistical Association, 107, 711-724, (2012) · Zbl 1261.62036
[23] Jiang, W.; Tanner, M. A., Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation, The Annals of Statistics, 27, 987-1011, (1999) · Zbl 0957.62032
[24] Jordan, M. I.; Jacobs, R. A., Hierarchical mixtures of experts and the EM algorithm, Neural Computation, 6, 181-214, (1994)
[25] Kiefer, N. M., Discrete parameter variation: efficient estimation of a switching regression model, Econometrica, 46, 427-434, (1978) · Zbl 0408.62058
[26] Krasker, W. S.; Welsch, R. E., Efficient bounded influence regression estimation, Journal of the American Statistical Association, 77, 595-604, (1982) · Zbl 0501.62062
[27] Lindsay, B. G.; Basak, P., Multivariate normal mixtures: a fast consistent method of moments, Journal of American Statistical Association, 88, 468-475, (1993) · Zbl 0773.62037
[28] Liu, R. Y.; Parelius, J. M.; Singh, K., Multivariate analysis by data depth: descriptive statistics, graphics and inference, Annals of Statistics, 27, 783-840, (1999) · Zbl 0984.62037
[29] MacQueen, J., 1967. Some methods for classfication and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1, pp. 281-297. · Zbl 0214.46201
[30] Markatou, M., Mixture models, robustness, and the weighted likelihood methodology, Biometrics, 56, 483-486, (2000) · Zbl 1060.62511
[31] Maronna, R. A.; Martin, R. D.; Yohai, V. J., Robust statistics: theory and methods, (2006), Wiley New York · Zbl 1094.62040
[32] Maronna, R. A.; Yohai, V. J., Asymptotic behavior of general M-estimators for regression and scale with random carriers, Probability Theory and Related Fields, 58, 7-20, (1981) · Zbl 0451.62031
[33] McLachlan, G. J.; Peel, D., Finite mixture models, (2000), Wiley New York · Zbl 0963.62061
[34] Mueller, C. H.; Garlipp, T., Simple consistent cluster methods based on redescending M-estimators with an application to edge identification in images, Journal of Multivariate Analysis, 92, 359-385, (2005) · Zbl 1062.62114
[35] Neykov, N.; Filzmoser, P.; Dimova, R.; Neytchev, P., Robust Fitting of mixtures using the trimmed likelihood estimator, Computational Statistics and Data Analysis, 52, 299-308, (2007) · Zbl 1328.62033
[36] Peel, D.; McLachlan, G. J., Robust mixture modelling using the \(t\) distribution, Statistics and Computing, 10, 339-348, (2000)
[37] Peters, B. C.; Walker, H. F., An iterative procedure for obtaining maximum likelihood estimators of the parameters for a mixture of normal distributions, SIAM Journal on Applied Mathematics, 35, 362-378, (1978) · Zbl 0443.65112
[38] Pison, G.; Van Aelst, S.; Willems, G., Small sample corrections for LTS and MCD, Metrika, 55, 111-123, (2002) · Zbl 1320.62060
[39] Rousseeuw, P. J., Least Median of squares regression, Journal of the American Statistical Association, 79, 871-880, (1984) · Zbl 0547.62046
[40] Rousseeuw, P. J.; Leroy, A. M., Robust regression and outlier detection, (1987), Wiley-Interscience New York · Zbl 0711.62030
[41] Rousseeuw, P. J.; Van Driessen, K., A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223, (1999)
[42] Rousseeuw, P. J.; van Zomeren, B. C., Unmasking multivariate outliers and leverate points, Journal of the American Statistical Association, 85, 633-639, (1990)
[43] Shen, H.; Yang, J.; Wang, S., Outlier detecting in fuzzy switching regression models, (Artificial Intelligence: Methodology, Systems, and Applications, Lecture Notes in Computer Science, vol. 3192, (2004)), 208-215
[44] Simpson, D. G.; Yohai, V. J., Functional stability of one-step estimators in approximately linear regression, Annals of Statistics, 26, 1147-1169, (1998) · Zbl 0930.62030
[45] Skrondal, A.; Rabe-Hesketh, S., Generalized latent variable modeling: multilevel, longitudinal and structural equation models, (2004), Chapman and Hall/CRC Boca Raton · Zbl 1097.62001
[46] Stahel, W.A., 1981. Robuste Schätzungen: infinitesimale optimalität und Schätzungen von Kovarianzmatrizen. Ph.D. Thesis, ETH Zürich. · Zbl 0531.62036
[47] Stephens, M., Dealing with label switching in mixture models, Journal of Royal Statistical Society, Series B, 62, 795-809, (2000) · Zbl 0957.62020
[48] Wedel, M.; Kamakura, W. A., Market segmentation: conceptual and methodological foundations, (2000), Kluwer Academic Publishers Norwell, MA, Journal of Classification. Springer, New York
[49] Yao, W., A profile likelihood method for normal mixture with unequal variance, Journal of Statistical Planning and Inference, 140, 2089-2098, (2010) · Zbl 1184.62029
[50] Yao, W., Model based labeling for mixture models, Statistics and Computing, 22, 337-347, (2012) · Zbl 1322.62047
[51] Yao, W.; Lindsay, B. G., Bayesian mixture labeling by highest posterior density, Journal of American Statistical Association, 104, 758-767, (2009) · Zbl 1388.62007
[52] Young, D. S.; Hunter, D. R., Mixtures of regressions with predictor-dependent mixing proportions, Computational Statistics and Data Analysis, 54, 2253-2266, (2010) · Zbl 1284.62467
[53] Zuo, Y.; Cui, H.; He, X., On the stahel-donoho estimator and depth-weighted means of multivariate data, Annals of Statistics, 32, 167-188, (2004) · Zbl 1105.62349
[54] Zuo, Y.; Serfling, R., General notions of statistical depth function, Annals of Statistics, 28, 461-482, (2000) · Zbl 1106.62334
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.