×

Generalized linear model selection using \(R^2\). (English) Zbl 1146.62052

Summary: The problem of model selection in generalized linear models amounts to selecting a subset of useful covariates from a set of possible covariates and choosing a link function from a set of possible link functions. A model selection procedure based on a modified \(R^{2}\) statistic is proposed. Like in linear models, \(R^{2}\) statistics in generalized linear models are used to quantify the proportion of variance in the response explained by covariates. Model selection using \(R^{2}\) statistics is natural for investigators who are familiar with the use of \(R^{2}\) statistics. The modified \(R^{2}\) statistic is obtained by introducing an extra penalty term on the complexity of the candidate model. Under weak conditions, the proposed procedure is shown to be consistent in the sense that with probability tending to one (as the sample size increases) the selected model equals the optimal model between the response and covariates. Simulation results are presented to demonstrate the effectiveness of the proposed procedure in finite sample applications.

MSC:

62J12 Generalized linear models (logistic models)
62F99 Parametric inference
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI

References:

[1] Akaike, H., Information theory and an extension of the maximum likelihood principle, (Petrov, B. N.; Czáki, F., Second International Symposium on Information Theory (1973), Akademiai Kiadó: Akademiai Kiadó Budapest), 267-281 · Zbl 0283.62006
[2] Akaike, H., A new look at statistical model identification, IEEE Trans. Automat. Control, 19, 716-723 (1974) · Zbl 0314.62039
[3] Bai, Z.; Rao, C. R.; Wu, Y., Model selection with data-oriented penalty, J. Statist. Plann. Inference, 77, 103-117 (1999) · Zbl 0926.62045
[4] Burnham, K. P.; Anderson, D. R., Model Selection and Multi-model Inference (2002), Springer: Springer Berlin · Zbl 1005.62007
[5] Czado, C., On selecting parametric link transformation families in generalized linear models, J. Statist. Plann. Inference, 61, 125-139 (1997) · Zbl 0879.62060
[6] Czado, C.; Munk, A., Noncanonical links in generalized linear models—when is the effort justified?, J. Statist. Plann. Inference, 87, 317-345 (2000) · Zbl 0969.62048
[7] Draper, N. R.; Smith, H., Applied Regression Analysis (1998), Wiley: Wiley New York · Zbl 0158.17101
[8] Hu, B.; Palta, M.; Shao, J., Properties of \(R^2\) statistics for logistic regression, Statist. Medicine, 25, 1383-1395 (2006)
[9] Konishi, S.; Kitagawa, G., Generalised information criteria in model selection, Biometrika, 83, 875-890 (1996) · Zbl 0883.62004
[10] Mallows, C. L., Some comments on \(C_p\), Technometrics, 15, 661-675 (1973) · Zbl 0269.62061
[11] McQuarrie, A. D.R.; Tsai, C.-L., Regression and Time Series Model Selection (1998), World Scientific: World Scientific Singapore · Zbl 0907.62095
[12] Mittlböck, M.; Schemper, M., Explained variation for logistic regression—small sample adjustments, confidence intervals and predictive precision, Biometrical J., 44, 263-272 (2002) · Zbl 1441.62440
[13] Mittlböck, M.; Schemper, M., Explained variation for logistic regression, Statist. Medicine, 15, 1987-1997 (1996)
[14] Pregibon, D., Goodness of link test for generalized models, Appl. Statist., 29, 15-24 (1980) · Zbl 0434.62048
[15] Qian, G.; Gabor, G.; Gupta, R. P., Generalized linear model selection by the predictive least quasi-deviance criterion, Biometrika, 83, 41-54 (1996) · Zbl 0865.62050
[16] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005
[17] Shao, J., Linear model selection by cross validation, J. Amer. Statist. Assoc., 422, 484-494 (1993) · Zbl 0773.62051
[18] Shao, J., Bootstrap model selection, J. Amer. Statist. Assoc., 434, 655-665 (1996) · Zbl 0869.62030
[19] Sugiura, N., Further analysis of the data by Akaike’s information criterion and the finite corrections, Comm. Statist. Theory Methods, A7, 13-26 (1978) · Zbl 0382.62060
[20] White, H., Maximum likelihood estimation of misspecified models, Econometrica, 50, 1-25 (1982) · Zbl 0478.62088
[21] Zhang, P., On the distributional properties of model selection criteria, J. Amer. Statist. Assoc., 419, 732-737 (1992) · Zbl 0781.62106
[22] Zheng, X.; Loh, W.-Y., Consistent variable selection in linear models, J. Amer. Statist. Assoc., 429, 151-156 (1995) · Zbl 0818.62060
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.