×

A computationally efficient projection-based approach for spatial generalized linear mixed models. (English) Zbl 07498984

Summary: Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction; our methods also compare favorably to existing “reduced-rank” approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types. Supplementary material for this article is available online.

MSC:

62-XX Statistics

Software:

CODA; R; GMRFLib; spBayes; FRK

References:

[1] Adler, R. J., An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes, Lecture Notes—Monograph Series, 12, 1-155 (1990) · Zbl 0747.60039
[2] Banerjee, A.; Dunson, D. B.; Tokdar, S. T., Efficient Gaussian Process Regression for Large Datasets, Biometrika, 100, 75-89 (2012) · Zbl 1284.62257
[3] Banerjee, S.; Gelfand, A. E.; Finley, A. O.; Sang, H., Gaussian Predictive Process Models for Large Spatial Data Sets, Journal of the Royal Statistical Society, Series B, 70, 825-848 (2008) · Zbl 1533.62065
[4] Belabbas, M.-A.; Wolfe, P. J., Spectral Methods in Machine Learning and New Strategies for Very Large Datasets, Proceedings of the National Academy of Sciences, 106, 369-374 (2009)
[5] Berrett, C.; Calder, C. A., Bayesian Spatial Binary Classification, Spatial Statistics, 16, 72-102 (2016)
[6] Besag, J.; York, J.; Molli, A., Bayesian Image Restoration, With Two Applications in Spatial Statistics, Annals of the Institute of Statistical Mathematics, 43, 1-20 (1991) · Zbl 0760.62029
[7] Bingham, E.; Mannila, H., Random Projection in Dimensionality Reduction: Applications to Image and Text Data, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: ACM, pp., 245-250 (2001)
[8] Christensen, O. F.; Roberts, G. O.; Sköld, M., Robust Markov Chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models, Journal of Computational and Graphical Statistics, 15, 1-17 (2006)
[9] Computational and Information Systems Laboratory, Yellowstone: IBM iDataPlex System (NCAR Community Computing) (2016), Boulder, CO: National Center for Atmospheric Research, Boulder, CO
[10] Cressie, N.; Johannesson, G., Fixed Rank Kriging for Very Large Spatial Data Sets, Journal of The Royal Statistical Society, Series B, 70, 209-226 (2008) · Zbl 05563351
[11] Cressie, N.; Wikle, C. K., Statistics for Spatio-Temporal Data (2015), New York: Wiley, New York
[12] Dasgupta, S.; Gupta, A., An Elementary Proof of a Theorem of Johnson and Lindenstrauss, Random Structures & Algorithms, 22, 60-65 (2003) · Zbl 1018.51010
[13] Datta, A.; Banerjee, S.; Finley, A. O.; Gelfand, A. E., Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets, Journal of the American Statistical Association, 111, 800-812 (2016)
[14] Deutsch, C. V.; Wang, L., Hierarchical Object-Based Stochastic Modeling of Fluvial Reservoirs, Mathematical Geology, 28, 857-880 (1996)
[15] Diggle, P. J.; Tawn, J. A.; Moyeed, R. A., Model-Based Geostatistics, Journal of the Royal Statistical Society, Series C, 47, 299-350 (1998) · Zbl 0904.62119
[16] Drineas, P.; Mahoney, M. W., On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning, Journal of Machine Learning Research, 6, 2153-2175 (2005) · Zbl 1222.68186
[17] Elliott, J. J.; Arbib Jr, R. S., Origin and Status of the House Finch in the Eastern United States, The Auk, 70, 31-37 (1953)
[18] Finley, A. O.; Banerjee, S.; Gelfand, A. E., spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models, Journal of Statistical Software, 63 (2013)
[19] Finley, A. O.; Sang, H.; Banerjee, S.; Gelfand, A. E., Improving the Performance of Predictive Process Modeling for Large Datasets, Computational Statistics & Data Analysis, 53, 2873-2884 (2009) · Zbl 1453.62090
[20] Flegal, J. M.; Haran, M.; Jones, G. L., Markov Chain Monte Carlo: Can We Trust The Third Significant Figure, Statistical Science, 23, 250-260 (2008) · Zbl 1327.62017
[21] Frieze, A.; Kannan, R.; Vempala, S., Fast Monte-Carlo Algorithms for Finding Low-Rank Approximations, Journal of the ACM (JACM), 51, 1025-1041 (2004) · Zbl 1125.65005
[22] Getis, A.; Griffith, D. A., Comparative Spatial Filtering in Regression Analysis, Geographical Analysis, 34, 130-140 (2002)
[23] Halko, N.; Martinsson, P.-G.; Tropp, J. A., Finding Structure With Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review, 53, 217-288 (2011) · Zbl 1269.65043
[24] Hanks, E. M.; Schliep, E. M.; Hooten, M. B.; Hoeting, J. A., Restricted Spatial Regression in Practice: Geostatistical Models, Confounding, and Robustness Under Model Misspecification, Environmetrics, 26, 243-254 (2015) · Zbl 1525.62132
[25] Haran, M., Gaussian Random Field Models for Spatial Data, Markov Chain Monte Carlo Handbook, 449-478 (2011), Boca Raton, FL: Chapman and Hall/CRC, Boca Raton, FL · Zbl 1416.62546
[26] Haran, M.; Hodges, J. S.; Carlin, B. P., Accelerating Computation in Markov Random Field Models for Spatial Data via Structured MCMC, Journal of Computational and Graphical Statistics, 12, 249-264 (2003)
[27] Harville, D. A., Matrix Algebra From a Statistician’s Perspective (1997), New York: Springer, New York · Zbl 0881.15001
[28] Higdon, D., A Process-Convolution Approach to Modelling Temperatures in the North Atlantic Ocean, Environmental and Ecological Statistics, 5, 173-190 (1998)
[29] Hodges, J. S.; Reich, B. J., Adding Spatially-Correlated Errors Can Mess up the Fixed Effect You Love, The American Statistician, 64, 325-334 (2010) · Zbl 1217.62095
[30] Homrighausen, D.; McDonald, D. J., On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Datasets, Journal of Computational and Graphical Statistics, 25, 344-362 (2016)
[31] Hughes, J.; Haran, M., Dimension Reduction and Alleviation of Confounding for Spatial Generalized Linear Mixed Models, Journal of the Royal Statistical Society, Series B, 75, 139-159 (2013) · Zbl 07555442
[32] John, A. K.; Lake, L. W.; Torres-Verdin, C.; Srinivasan, S., Seismic Facies Identification and Classification Using Simple Statistics, SPE Reservoir Evaluation & Engineering, 11, 984-990 (2008)
[33] Nychka, D.; Bandyopadhyay, S.; Hammerling, D.; Lindgren, F.; Sain, S., A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets, Journal of Computational and Graphical Statistics, 24, 579-599 (2015)
[34] Pardieck, K. L.; Ziolkowski, D. J.; Hudson, M. A. R.; Campbell, K., North American Breeding Bird Survey Dataset 1966-2015, version 2015.0 (2016)
[35] Plummer, M.; Best, N.; Cowles, K.; Vines, K., CODA: Convergence Diagnosis and Output Analysis for MCMC, R News, 6, 7-11 (2006)
[36] R Core Team, R: A Language and Environment for Statistical Computing (2013), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[37] Rasmussen, C. E.; Williams, C. K. I., Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (2005), Cambridge, MA, and London: The MIT Press
[38] Reich, B. J.; Hodges, J. S.; Zadnik, V., Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease-Mapping Models, Biometrics, 62, 1197-1206 (2006) · Zbl 1114.62124
[39] Rue, H.; Held, L., Gaussian Markov Random Fields: Theory and Applications (2005), Boca Raton, FL: CRC Press, Boca Raton, FL · Zbl 1093.60003
[40] Rue, H.; Martino, S.; Chopin, N., Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations, Journal of the Royal Statistical Society, Series B, 71, 319-392 (2009) · Zbl 1248.62156
[41] Sang, H.; Huang, J. Z., A Full Scale Approximation of Covariance Functions for Large Spatial Data Sets, Journal of the Royal Statistical Society, Series B, 74, 111-132 (2012) · Zbl 1411.62274
[42] Sarlos, T., Improved Approximation Algorithms for Large Matrices via Random Projections, Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, 143-152 (2006), Washington, DC: IEEE Computer Society, Washington, DC
[43] Sengupta, A.; Cressie, N., Hierarchical Statistical Modeling of Big Spatial Datasets Using the Exponential Family of Distributions, Spatial Statistics, 4, 14-44 (2013)
[44] Sengupta, A.; Cressie, N.; Kahn, B. H.; Frey, R., Predictive Inference for Big, Spatial, Non-Gaussian Data: Modis Cloud Data and Its Change-of-Support, Australian and New Zealand Journal of Statistics, 58, 15-45 (2016)
[45] Shaby, B.; Ruppert, D., Tapered Covariance: Bayesian Estimation and Asymptotics, Journal of Computational and Graphical Statistics, 21, 433-452 (2012)
[46] Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P.; Van Der Linde, A., Bayesian Measures of Model Complexity and Fit, Journal of the Royal Statistical Society, Series B, 64, 583-639 (2002) · Zbl 1067.62010
[47] Stein, M., Interpolation of Spatial Data: Some Theory for Kriging (Springer Series in Statistics) (1999), New York: Springer, New York · Zbl 0924.62100
[48] ———, Limitations on Low Rank Approximations for Covariance Matrices of Spatial Data, Spatial Statistics, 8, 1-19 (2014)
[49] Tropp, J. A., Improved Analysis of the Subsampled Randomized Hadamard Transform, Advances in Adaptive Data Analysis, 3, 115-126 (2011) · Zbl 1232.15029
[50] Williams, C.; Seeger, M., Using the Nyström Method to Speed Up Kernel Machines, Advances in Neural Information Processing Systems 13, Cambridge, MA, and London: The MIT Press, pp., 682-688 (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.