×

The spike-and-slab LASSO. (English) Zbl 1398.62186

Summary: Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for penalized likelihood estimation has largely been overlooked. In this article, we bridge this gap by cross-fertilizing these two paradigms with the Spike-and-Slab LASSO procedure for variable selection and parameter estimation in linear regression. We introduce a new class of self-adaptive penalty functions that arise from a fully Bayes spike-and-slab formulation, ultimately moving beyond the separable penalty framework. A virtue of these nonseparable penalties is their ability to borrow strength across coordinates, adapt to ensemble sparsity information and exert multiplicity adjustment. The Spike-and-Slab LASSO procedure harvests efficient coordinate-wise implementations with a path-following scheme for dynamic posterior exploration. We show on simulated data that the fully Bayes penalty mimics oracle performance, providing a viable alternative to cross-validation. We develop theory for the separable and nonseparable variants of the penalty, showing rate-optimality of the global mode as well as optimal posterior concentration when \(p > n\).

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F15 Bayesian inference
Full Text: DOI

References:

[1] Armero, C.; Bayarri, M., Prior assessments in prediction in queues, The Statistician, 45, 139-153, (1994)
[2] Bellec, P.; Lecue, G.; Tsybakov, A., Slope meets lasso: improved oracle bounds and optimality, (2016)
[3] Bhattacharya, A.; Pati, D.; Pillai, N.; Dunson, D., Dirichlet-Laplace priors for optimal shrinkage, Journal of the American Statistical Association, 110, 1479-1490, (2015) · Zbl 1373.62368
[4] Breheny, P.; Huang, J., Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Annals of Applied Statistics, 5, 232-253, (2011) · Zbl 1220.62095
[5] Bühlmann, P.; van der Geer, S., Statistics for High-Dimensional Data, (2011), Springer, New York · Zbl 1273.62015
[6] Candes, E.; Wakin, M.; Boyd, S., Enhancing sparsity by reweighted {\textitl_{1}} minimization, Journal of Fourier Analysis and Applications, 14, 877-905, (2008) · Zbl 1176.94014
[7] Carvalho, C.; Polson, N., The horseshoe estimator for sparse signals, Biometrika, 97, 465-480, (2010) · Zbl 1406.62021
[8] Castillo, I.; Schmidt-Hieber, J.; van der Vaart, A., Bayesian linear regression with sparse priors, The Annals of Statistics, 43, 1986-2018, (2015) · Zbl 1486.62197
[9] Castillo, I.; van der Vaart, A., Needles and straw in a haystack: posterior concentration for possibly sparse sequences, The Annals of Statistics, 40, 2069-2101, (2012) · Zbl 1257.62025
[10] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360, (2001) · Zbl 1073.62547
[11] Fan, Y.; Lv, J., Asymptotic properties for combined L1 and concave regularization, Biometrika, 101, 57-70, (2014) · Zbl 1285.62074
[12] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 22, 1-22, (2010)
[13] George, E. I.; McCulloch, R. E., Variable selection via Gibbs sampling, Journal of the American Statistical Association, 88, 881-889, (1993)
[14] Gradshteyn, I.; Ryzhik, E., Table of Integrals Series and Products, (2000), Academic Press, Cambridge, MA · Zbl 0981.65001
[15] Ismail, M.; Pitman, J., Algebraic evaluations of some Euler integrals, duplication formulae for Appell’s hypergeometric function {\textitf_{1}}, and Brownian variations, Canadian Journal of Mathematics, 52, 961-981, (2000) · Zbl 0961.33012
[16] Johnstone, I. M.; Silverman, B. W., Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences, The Annals of Statistics, 32, 1594-1649, (2004) · Zbl 1047.62008
[17] Loh, P.; Wainwright, M., Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima, Journal of Machine Learning Research, 1, 1-56, (2014)
[18] Martin, R.; Walker, S., Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector, Electronic Journal of Statistics, 8, 2188-2206, (2014) · Zbl 1302.62015
[19] Mazumder, R.; Friedman, J.; Hastie, T., Sparsenet: coordinate descent with nonconvex penalties, Journal of the American Statistical Association, 106, 1125-1138, (2011) · Zbl 1229.62091
[20] Moreno, E.; Girón, J.; Casella, G., Posterior model consistency in variable selection as the model dimension grows, Statistical Science, 30, 228-241, (2015) · Zbl 1332.62100
[21] Polson, N.; Scott, J., Shrink globally, act locally: sparse Bayesian regularization and prediction, Bayesian Statistics, 9, 501-539, (2010)
[22] Rockova, V., Bayesian estimation of sparse signals with a continuous spike-and-slab prior, Annals of Statistics, forthcoming, (2015)
[23] Particle EM for variable selection, Journal of the American Statistical Association, forthcoming, (2016)
[24] Rockova, V.; George, E., EMVS: the EM approach to Bayesian variable selection, Journal of the American Statistical Association, 109, 828-846, (2014) · Zbl 1367.62049
[25] Bayesian penalty mixing: the case of a non-separable penalty, Statistical Analysis for High-Dimensional Data, Abel Symposia, 11, 233-254, (2016) · Zbl 1384.62101
[26] Scheipl, F.; Fahrmeir, L.; Kneib, T., Spike-and-slab priors for function selection in structured additive regression models, Journal of the American Statistical Association, 107, 1518-1532, (2012) · Zbl 1258.62082
[27] Scott, J. G.; Berger, J. O., Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem, The Annals of Statistics, 38, 2587-2619, (2010) · Zbl 1200.62020
[28] She, Y., Thresholding-based iterative selection procedures for model selection and shrinkage, Electronic Journal of Statistics, 3, 384-415, (2009) · Zbl 1326.62158
[29] Su, W.; Candes, E., Slope is adaptive to unknown sparsity and asymptotically minimax, Annals of Statistics, 44, 1038-1068, (2016) · Zbl 1338.62032
[30] Tibshirani, R., Regression shrinkage and selection via the LASSO, Journal of the Royal Statistical Society, 58, 267-288, (1994) · Zbl 0850.62538
[31] van der Pas, S.; Kleijn, B.; van der Vaart, A., The horseshoe estimator: posterior concentration around nearly black vectors, Electronic Journal of Statistics, 8, 2585-2618, (2014) · Zbl 1309.62060
[32] Wang, Z.; Liu, H.; Zhang, T., Optimal computational and statistical rates of convergence for sparse nonconvex learning problems, The Annals of Statistics, 42, 2164-2201, (2014) · Zbl 1302.62066
[33] Zhang, C.; Zhang, T., A general theory of concave regularization for high-dimensional sparse estimation problems, Statistical Science, 27, 576-593, (2012) · Zbl 1331.62353
[34] Zhang, C. H., Nearly unbiased variable selection under minimax concave penalty, The Annals of Statistics, 38, 894-942, (2010) · Zbl 1183.62120
[35] Zou, H., The adaptive LASSO and its oracle properties, Journal of the American Statistical Association, 101, 1418-1429, (2006) · Zbl 1171.62326
[36] Zou, H.; Li, R., One-step sparse estimates in nonconcave penalized likelihood models, The Annals of Statistics, 36, 1509-1533, (2008) · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.