×

Generalized linear mixed model with a penalized Gaussian mixture as a random effects distribution. (English) Zbl 1452.62538

Summary: Generalized linear mixed models are popular for regressing a discrete response when there is clustering, e.g. in longitudinal studies or in hierarchical data structures. It is standard to assume that the random effects have a normal distribution. Recently, it has been examined whether wrongly assuming a normal distribution for the random effects is important for the estimation of the fixed effects parameters. While it has been shown that misspecifying the distribution of the random effects has a minor effect in the context of linear mixed models, the conclusion for generalized mixed models is less clear. Some studies report a minor impact, while others report that the assumption of normality really matters especially when the variance of the random effect is relatively high. Since it is unclear whether the normality assumption is truly satisfied in practice, it is important that generalized mixed models are available which relax the normality assumption. A replacement of the normal distribution with a mixture of Gaussian distributions specified on a grid whereby only the weights of the mixture components are estimated using a penalized approach ensuring a smooth distribution for the random effects is proposed. The parameters of the model are estimated in a Bayesian context using MCMC techniques. The usefulness of the approach is illustrated on two longitudinal studies using R-functions.

MSC:

62J12 Generalized linear models (logistic models)
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
62-08 Computational methods for problems pertaining to statistics
Full Text: DOI

References:

[1] Agresti, A.; Caffo, B.; Ohman-Strickland, P., Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies, Computational Statistics and Data Analysis, 47, 639-653 (2004) · Zbl 1429.62483
[2] Besag, J.; Green, P.; Higdon, D.; Mengersen, K., Bayesian computation and stochastic systems (with Discussion), Statistical Science, 10, 3-66 (1995) · Zbl 0955.62552
[3] Bogaerts, K., Lesaffre, E., 2007. Estimating local and global measures of association for bivariate interval censored data with a smooth estimate of the density (submitted for publication); Bogaerts, K., Lesaffre, E., 2007. Estimating local and global measures of association for bivariate interval censored data with a smooth estimate of the density (submitted for publication)
[4] Booth, J.; Casella, G.; Friedl, H.; Hobert, J., Negative binomial loglinear mixed models, Statistical Modelling, 3, 179-191 (2003) · Zbl 1070.62058
[5] Breslow, N. E.; Clayton, D. G., Approximate inference in generalized linear mixed models, Journal of the American Statistical Association, 88, 9-25 (1993) · Zbl 0775.62195
[6] Bush, C. A.; MacEachern, S. N., A semiparametric Bayesian model for randomised block designs, Biometrika, 83, 275-285 (1996) · Zbl 0864.62052
[7] Butler, S. M.; Louis, T., Random effects models with nonparametric priors, Statistics in Medicine, 11, 1981-2000 (1992)
[8] Caffo, B.; Ming-Wen, A.; Rohde, C., Flexible random intercept models for binary outcomes using mixtures of normals, Computational Statistics and Data Analysis, 51, 5220-5235 (2007) · Zbl 1445.62191
[9] Chen, J.; Zhang, D.; Davidian, M., A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution, Biostatistics, 3, 347-360 (2002) · Zbl 1135.62355
[10] De Backer, M.; De Vroey, C.; Lesaffre, E.; Scheys, I.; De Keyser, P., Twelve weeks of continuous onychomycosis caused by dermatophytes: A double blind comparative trial of terbafine 250 mg/day versus itraconazole 200 mg/day, Journal of the American Academy of Dermatology, 38, S57-S63 (1998)
[11] Eilers, P. H.C.; Marx, B. D., Flexible smoothing with B-splines and penalties (with Discussion), Statistical Science, 11, 89-121 (1996) · Zbl 0955.62562
[12] Fahrmeir, L.; Tutz, G., Multivariate Statistical Modelling Based on Generalized Linear Models (2001), Springer-Verlag: Springer-Verlag New York · Zbl 0980.62052
[13] Fieuws, S.; Spiessens, B.; Draney, K., Mixture models, (De Boeck, P.; Wilson, M., Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach (2004), Springer-Verlag: Springer-Verlag New York), 317-340, (Chapter 11) · Zbl 1098.91002
[14] Follmann, D. A.; Lambert, D., Generalizing logistic regression by nonparametric mixing, Journal of the American Statistical Association, 84, 295-300 (1989)
[15] Gallant, A. R.; Nychka, D. W., Semi-nonparametric maximum likelihood estimation, Econometrica, 55, 363-390 (1987) · Zbl 0631.62110
[16] Gamerman, D., Sampling from the posterior distribution in generalized linear mixed models, Statistics and Computing, 7, 57-68 (1997)
[17] Gelfand, A. E.; Smith, A. F.M., Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409 (1990) · Zbl 0702.62020
[18] Gelman, A., Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1, 515-533 (2006) · Zbl 1331.62139
[19] Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B., Bayesian Data Analysis (2004), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton · Zbl 1039.62018
[20] Geweke, J., Evaluating the accuracy of sampling-based approaches to calculating posterior moments (with Discussion), (Bernardo, J. M.; Dawid, J. O.; Smith, A. P., Bayesian Statistics, vol. 4 (1992), Oxford University Press: Oxford University Press Oxford), 169-193
[21] Ghidey, W.; Lesaffre, E.; Eilers, P., Smooth random effects distribution in a linear mixed model, Biometrics, 60, 945-953 (2004) · Zbl 1274.62238
[22] Hanson, T., Inference for mixtures of finite Polya tree models, Journal of the American Statistical Association, 101, 1548-1565 (2006) · Zbl 1171.62323
[23] Heagerty, P. J.; Kurland, B. F., Misspecified maximum likelihood estimates and generalised linear mixed models, Biometrika, 88, 973-985 (2001) · Zbl 0986.62060
[24] Held, L., Simultaneous posterior probability statements from Monte Carlo output, Journal of Computational and Graphical Statistics, 13, 20-35 (2004)
[25] Jara, A., 2007. Applied Bayesian non- and semi-parametric inference using DPpackage. Rnews (in press); Jara, A., 2007. Applied Bayesian non- and semi-parametric inference using DPpackage. Rnews (in press)
[26] Jara, A.; García-Zattera, M. J.; Lesaffre, E., A Dirichlet process mixture model for the analysis of correlated binary responses, Computational Statistics and Data Analysis, 51, 5402-5415 (2007) · Zbl 1445.62148
[27] Jara, A., Hanson, T., Lesaffre, E., 2007. Robustifying generalized linear mixed models using mixtures of multivariate Polya trees (submitted for publication); Jara, A., Hanson, T., Lesaffre, E., 2007. Robustifying generalized linear mixed models using mixtures of multivariate Polya trees (submitted for publication)
[28] Kleinman, K. P.; Ibrahim, J. G., A semi-parametric Bayesian approach to generalized linear mixed models, Statistics in Medicine, 17, 2579-2596 (1998)
[29] Kleinman, K. P.; Ibrahim, J. G., A semiparametric Bayesian approach to the random effects model, Biometrics, 54, 921-938 (1998) · Zbl 1058.62513
[30] Knorr-Held, L.; Rue, H., On block updating in Markov random fields models for disease mapping, Scandinavian Journal of Statistics, 29, 597-614 (2002) · Zbl 1039.62092
[31] Komárek, A.; Lesaffre, E., Bayesian semi-parametric accelerated failure time model for paired doubly-interval-censored data, Statistical Modelling, 6, 3-22 (2006) · Zbl 07257122
[32] Komárek, A., Lesaffre, E., 2008. Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association (in press); Komárek, A., Lesaffre, E., 2008. Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association (in press) · Zbl 1469.62373
[33] Komárek, A.; Lesaffre, E.; Hilton, J. F., Accelerated failure time model for arbitrarily censored data with smoothed error distribution, Journal of Computational and Graphical Statistics, 14, 726-745 (2005)
[34] Laird, N., Nonparametric maximum likelihood estimation of a mixing distribution, Journal of the American Statistical Association, 73, 805-811 (1978) · Zbl 0391.62029
[35] Lang, S.; Brezger, A., Bayesian P-splines, Journal of Computational and Graphical Statistics, 13, 183-212 (2004)
[36] Lee, Y.; Nelder, J. A., Conditional and marginal models: Another view (with Discussion), Statistical Science, 19, 219-238 (2004) · Zbl 1100.62591
[37] Lesaffre, E.; Spiessens, B., On the effect of the number of quadrature points in a logistic random-effects model: An example, Applied Statistics, 50, 325-335 (2001) · Zbl 1112.62307
[38] Liang, K. Y.; Zeger, S. L., Longitudinal data analysis using generalized linear models, Biometrika, 73, 13-22 (1986) · Zbl 0595.62110
[39] Litière, S., Alonso, A., Molenberghs, G., 2008. The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine (in press); Litière, S., Alonso, A., Molenberghs, G., 2008. The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine (in press)
[40] Molenberghs, G.; Verbeke, G., Models for Discrete Longitudinal Data (2005), Springer Science+Business Media: Springer Science+Business Media New York · Zbl 1093.62002
[41] Mukhopadhyay, S.; Gelfand, A. E., Dirichlet process mixed generalized linear models, Journal of the American Statistical Association, 92, 633-639 (1997) · Zbl 0889.62062
[42] Neal, R. M., Slice sampling (with Discussion), The Annals of Statistics, 31, 705-767 (2003) · Zbl 1051.65007
[43] Neuhaus, J. M.; Hauck, W. W.; Kalbfleisch, J. D., The effects of mixture distribution misspecification when fitting mixed-effects logistic models, Biometrika, 79, 755-762 (1992)
[44] Development Core Team, R, 2007. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN: 3-900051-07-0, http://www.R-project.org; Development Core Team, R, 2007. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN: 3-900051-07-0, http://www.R-project.org
[45] Raftery, A. E.; Lewis, S. M., One long run with diagnostics: Implementation strategies for Markov chain Monte Carlo, Statistical Science, 7, 493-497 (1992)
[46] Robert, C. P.; Casella, G., Monte Carlo Statistical Methods (2004), Springer-Verlag: Springer-Verlag New York · Zbl 1096.62003
[47] Rue, H.; Held, L., Gaussian Markov Random Fields: Theory and Applications (2005), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton · Zbl 1093.60003
[48] Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P.; van der Linde, A., Bayesian measures of model complexity and fit (with Discussion), Journal of the Royal Statistical Society, Series B, 64, 583-639 (2002) · Zbl 1067.62010
[49] Tanner, M. A.; Wong, W. H., The calculation of posterior distributions by data augmentation, Journal of the American Statistical Association, 82, 528-550 (1987) · Zbl 0619.62029
[50] Thall, P. F.; Vail, S. C., Some covariance models for longitudinal count data with overdispersion, Biometrics, 46, 657-671 (1990) · Zbl 0712.62048
[51] Unser, M.; Aldroubi, A.; Eden, M., On the asymptotic convergence of B-spline wavelets to Gabor functions, IEEE Transactions on Information Theory, 38, 864-872 (1992) · Zbl 0757.41022
[52] Verbeke, G.; Lesaffre, E., A linear mixed-effects model with heterogeneity in the random-effects population, Journal of the American Statistical Association, 91, 217-221 (1996) · Zbl 0870.62057
[53] Verbeke, G.; Lesaffre, E., The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data, Computational Statistics and Data Analysis, 23, 541-556 (1997) · Zbl 0900.62374
[54] Walker, S. G.; Mallick, B. K., Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing, Journal of the Royal Statistical Society, Series B, 59, 845-860 (1997) · Zbl 0886.62072
[55] Zackin, R.; De Gruttola, V.; Laird, N., Nonparametric mixed-effects models for repeated binary data arising in serial dilution assays: An application to estimating viral burden in AIDS, Journal of the American Statistical Association, 91, 52-61 (1996) · Zbl 0925.62476
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.