×

A combined statistical and machine learning approach for spatial prediction of extreme wildfire frequencies and sizes. (English) Zbl 07685222

Summary: Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage, bivariate, sparse spatial model for high-dimensional zero-inflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatially-varying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, we further model the standardized log-transformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are post-processed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a cross-validation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors.

MSC:

62G32 Statistics of extreme values; tail inference
62H11 Directional data; spatial statistics
62J05 Linear regression; mixed models
62J12 Generalized linear models (logistic models)
62P12 Applications of statistics to environmental and related topics

Software:

R-INLA; GMRFLib; FRK

References:

[1] Abatzoglou, JT; Williams, AP, Impact of anthropogenic climate change on wildfire across western US forests, Proc. Natl. Acad. Sci., 113, 42, 11770-11775 (2016) · doi:10.1073/pnas.1607171113
[2] Abdelfatah, K., Bao, J., Terejanu, G.: Environmental modeling framework using stacked Gaussian processes. Preprint at https://arxiv.org/abs/1612.02897 (2016)
[3] Agarwal, G., Sun,Y., Wang, H.J.: Copula-based multiple indicator kriging for non-Gaussian random fields. Spat. Stat. 100524 (2021)
[4] Bakka, H., Rue, H., Fuglstad, G.A., Riebler, A., Bolin, D., Illian, J., Krainski, E., Simpson, D., Lindgren, F.: Spatial modeling with R-INLA: a review. Wiley Interdiscip. Rev. Comput. Stat. 10(6), e1443 (2018) · Zbl 07910834
[5] Banerjee, S.: Modeling massive spatial datasets using a conjugate Bayesian linear modeling framework. Spat. Stat. 37, 100417 (2020)
[6] Bivand, R.; Gómez-Rubio, V.; Rue, H., Spatial data analysis with R-INLA with some extensions, J. Stat. Softw., 63, 20, 1-31 (2015) · doi:10.18637/jss.v063.i20
[7] Breiman, L., Random Forests, Mach. Learn., 45, 1, 5-32 (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[8] Brown, E.K., Wang, J., Feng, Y.: US wildfire potential: a historical view and future projection using high-resolution climate data. Environ. Res. Lett. 16(3), 034060 (2021)
[9] Cressie, N.; Johannesson, G., Fixed rank kriging for very large spatial data sets, J. R. Stat. Soc. Series B Stat. Methodol., 70, 1, 209-226 (2008) · Zbl 05563351 · doi:10.1111/j.1467-9868.2007.00633.x
[10] Cumming, S., A parametric model of the fire-size distribution, Can. J. For. Res., 31, 8, 1297-1303 (2001) · doi:10.1139/x01-032
[11] Davison, AC; Huser, R.; Thibaud, E., Geostatistics of dependent and asymptotically independent extremes, Math. Geosci., 45, 5, 511-529 (2013) · Zbl 1321.86016 · doi:10.1007/s11004-013-9469-y
[12] Diggle, PJ; Moraga, P.; Rowlingson, B.; Taylor, BM, Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm, Stat. Sci., 28, 4, 542-563 (2013) · Zbl 1331.86027 · doi:10.1214/13-STS441
[13] Dutta, S.; Bhattacharya, S., Markov chain Monte Carlo based on deterministic transformations, Stat. Methodol., 16, 100-116 (2014) · Zbl 1486.62004 · doi:10.1016/j.stamet.2013.08.006
[14] Fusco, EJ; Finn, JT; Balch, JK; Nagy, RC; Bradley, BA, Invasive grasses increase fire occurrence and frequency across US ecoregions, Proc. Natl. Acad. Sci., 116, 47, 23594-23599 (2019) · doi:10.1073/pnas.1908253116
[15] Gabriel, E.; Opitz, T.; Bonneu, F., Detecting and modeling multi-scale space-time structures: the case of wildfire occurrences, J. Soc. Fr. Stat., 158, 3, 86-105 (2017) · Zbl 1375.86015
[16] Gelfand, AE; Banerjee, S.; Gamerman, D., Spatial process modelling for univariate and multivariate dynamic spatial data, Environmetrics, 16, 5, 465-479 (2005) · doi:10.1002/env.715
[17] Gelfand, AE; Schliep, EM, Spatial statistics and Gaussian processes: a beautiful marriage, Spat. Stat., 18, 86-104 (2016) · doi:10.1016/j.spasta.2016.03.006
[18] Genton, M.G., Butry, D.T., Gumpertz, M.L., Prestemon, J.P.: Spatio-temporal analysis of wildfire ignitions in the St. Johns River water management district, Florida. Int. J. Wildland Fire 15(1), 87-97 (2006)
[19] Hazra, A.; Huser, R., Estimating high-resolution Red Sea surface temperature hotspots, using a low-rank semiparametric spatial model, Ann. Appl. Stat., 15, 2, 572-596 (2021) · Zbl 1478.62355 · doi:10.1214/20-AOAS1418
[20] Hazra, A., Huser, R., Bolin, D.: A sparse Gaussian scale mixture process for short-range extremal dependence and long-range independence. Preprint at http://arxiv.org/abs/2112.10248 (2021)
[21] Hazra, A., Huser, R., Jóhannesson, Á.V.: Latent Gaussian models for high-dimensional spatial extremes. Preprint at http://arxiv.org/abs/2110.02680 (2021)
[22] Hazra, A.; Reich, BJ; Reich, DS; Shinohara, RT; Staicu, AM, A spatio-temporal model for longitudinal image-on-image regression, Stat. Biosci., 11, 1, 22-46 (2019) · doi:10.1007/s12561-017-9206-z
[23] Hazra, A., Reich, B.J., Shaby, B.A., Staicu, A.M.: A semiparametric spatiotemporal Bayesian model for the bulk and extremes of the Fosberg Fire Weather Index. Preprint at http://arxiv.org/abs/1812.11699 (2018)
[24] Hering, AS; Bell, CL; Genton, MG, Modeling spatio-temporal wildfire ignition point patterns, Environ. Ecol. Stat., 16, 2, 225-250 (2009) · doi:10.1007/s10651-007-0080-6
[25] Hrafnkelsson, B.; Siegert, S.; Huser, R.; Bakka, H.; Jóhannesson, ÁV, Max-and-smooth: a two-step approach for approximate Bayesian inference in latent Gaussian models, Bayesian Anal., 16, 2, 611-638 (2021) · Zbl 1480.62056 · doi:10.1214/20-BA1219
[26] Huser, R.; Opitz, T.; Thibaud, E., Bridging asymptotic independence and dependence in spatial extremes using Gaussian scale mixtures, Spat. Stat., 21, 166-186 (2017) · doi:10.1016/j.spasta.2017.06.004
[27] Huser, R., Wadsworth, J.L.: Advances in statistical modeling of spatial extremes. Wiley Interdiscip. Rev. Comput. Stat. 14, e1537 (2022) · Zbl 07910958
[28] Jain, P.; Coogan, SC; Subramanian, SG; Crowley, M.; Taylor, S.; Flannigan, MD, A review of machine learning applications in wildfire science and management, Environ. Rev., 28, 4, 478-505 (2020) · doi:10.1139/er-2020-0019
[29] Johannesson, ÁV; Siegert, S.; Huser, R.; Bakka, H.; Hrafnkelsson, B., Approximate Bayesian inference for analysis of spatio-temporal flood frequency data, Ann. Appl. Stat., 16, 2, 905-935 (2022) · Zbl 1498.62285 · doi:10.1214/21-AOAS1525
[30] Joseph, M.B., Rossi, M.W., Mietkiewicz, N.P., Mahood, A.L., Cattau, M.E., St, L.A., Denis, R.C., Nagy, V., Iglesias, J.T. Abatzoglou., Balch, J.K.: Spatiotemporal prediction of wildfire size extremes with Bayesian finite sample maxima. Ecol. Appl. 29(6), e01898 (2019)
[31] Juan, P.; Mateu, J.; Saez, M., Pinpointing spatio-temporal interactions in wildfire patterns, Stoch. Env. Res. Risk Assess., 26, 8, 1131-1150 (2012) · doi:10.1007/s00477-012-0568-y
[32] Katzfuss, M., Bayesian nonstationary spatial modeling for very large datasets, Environmetrics, 24, 3, 189-200 (2013) · Zbl 1525.62153 · doi:10.1002/env.2200
[33] Koh, J., Pimont, F., Dupuy, J.L., Opitz, T.: Spatiotemporal wildfire modeling through point processes with moderate and extreme marks. Preprint at https://arxiv.org/abs/2105.08004 (2021) · Zbl 07656989
[34] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, J. R. Stat. Soc. Series B Stat. Methodol., 73, 4, 423-498 (2011) · Zbl 1274.62360 · doi:10.1111/j.1467-9868.2011.00777.x
[35] Menze, BH; Kelm, BM; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, FA, A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data, BMC Bioinf., 10, 1, 1-16 (2009) · doi:10.1186/1471-2105-10-213
[36] Møller, J.; Díaz-Avalos, C., Structured spatio-temporal shot-noise Cox point process models, with a view to modelling forest fires, Scand. J. Stat., 37, 1, 2-25 (2010) · Zbl 1224.62093 · doi:10.1111/j.1467-9469.2009.00670.x
[37] Møller, J.; Syversveen, AR; Waagepetersen, RP, Log-Gaussian Cox processes, Scand. J. Stat., 25, 3, 451-482 (1998) · Zbl 0931.60038 · doi:10.1111/1467-9469.00115
[38] Nadeem, K.; Taylor, S.; Woolford, DG; Dean, C., Mesoscale spatiotemporal predictive models of daily human-and lightning-caused wildland fire occurrence in British Columbia, Int. J. Wildland Fire, 29, 1, 11-27 (2020) · doi:10.1071/WF19058
[39] Opitz, T.: Editorial: EVA 2021 Data Competition on spatio-temporal prediction of wildfire activity in the United States. Extremes (to appear). (2022)
[40] Penttinen, A.; Stoyan, D.; Henttonen, HM, Marked point processes in forest statistics, Forest Sci., 38, 4, 806-824 (1992)
[41] Pereira, J., Turkman, K.: Statistical models of vegetation fires: Spatial and temporal patterns. Handbook of Environmental and Ecological Statistics, pp. 401-420. Taylor & Francis: Chapman and Hall/CRC (2019)
[42] Pimont, F., Fargeon, H., Opitz, T., Ruffault, J., Barbero, R., Martin-StPaul, N., Rigolot, E., Rivière, M., Dupuy, J.L.: Prediction of regional wildfire activity in the probabilistic Bayesian framework of Firelihood. Ecol. Appl. 31(5), e02316 (2021)
[43] Preisler, H.; Ager, A., Forest-fire models, Environ. Encycl., 3, 2181-2185 (2013)
[44] Preisler, HK; Brillinger, DR; Burgan, RE; Benoit, J., Probability based models for estimation of wildfire risk, Int. J. Wildland Fire, 13, 2, 133-142 (2004) · doi:10.1071/WF02061
[45] Preisler, HK; Westerling, AL, Statistical model for forecasting monthly large wildfire events in western United States, J. Appl. Meteorol. Climatol., 46, 7, 1020-1030 (2007) · doi:10.1175/JAM2513.1
[46] Pyne, S.; Andrew, P.; Laven, R., Introduction to Wildland and Rural Fire (1996), Princeton, NJ: Princeton University Press, Princeton, NJ
[47] Ríos-Pena, L.; Kneib, T.; Cadarso-Suárez, C.; Klein, N.; Marey-Pérez, M., Studying the occurrence and burnt area of wildfires using zero-one-inflated structured additive beta regression, Environ. Model. Software, 110, 107-118 (2018) · doi:10.1016/j.envsoft.2018.03.008
[48] Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Taylor & Francis: Chapman and Hall/CRC (2005) · Zbl 1093.60003
[49] Saha, A., Basu, S., Datta, A.: Random forests for spatially dependent data. J. Am. Stat. Assoc. 1-19 (2021). doi:10.1080/01621459.2021.1950003
[50] Scott, AC, The pre-quaternary history of fire, Palaeogeogr. Palaeoclimatol. Palaeoecol., 164, 1-4, 281-329 (2000) · doi:10.1016/S0031-0182(00)00192-9
[51] Serra, L., Saez, M., Varga, D., Tobías, A., Juan, P., Mateu, J.: Spatio-temporal modelling of wildfires in Catalonia, Spain, 1994-2008, through log-Gaussian Cox processes. Modelling, Monitoring and Management of Forest Fires III, pp. 11139. (2012)
[52] Trucchia, A., Egorova, V., Pagnini, G., Rochou, M.C.: Surrogate-based global sensitivity analysis for turbulence and fire-spotting effects in regional-scale wildland fire modeling. Preprint at https://arxiv.org/abs/1809.05430 (2018)
[53] Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681-688. (2011)
[54] Wikle, C.K.: Low-rank representations for spatial processes. Handbook of Spatial Statistics, pp. 114-125. Taylor & Francis: CRC Press (2010)
[55] Wuebbles, D.J., Fahey, D.W., Hibbard, K.A., Arnold, J.R., DeAngelo, B., Doherty, S., Easterling, D.R., Edmonds, J., Edmonds, T., Hall, T. et al.: Climate science special report: Fourth national climate assessment (NCA4), vol. I. (2017)
[56] Xi, DD; Taylor, SW; Woolford, DG; Dean, C., Statistical models of key components of wildfire risk, Annu. Rev. Stat. Appl., 6, 197-222 (2019) · doi:10.1146/annurev-statistics-031017-100450
[57] Yadav, R., Huser, R., Opitz, T.: A flexible Bayesian hierarchical modeling framework for spatially dependent peaks over-threshold-data. Spat. Stat. 51, 100672 (2022)
[58] Zammit-Mangion, A.; Cressie, N., FRK: an R package for spatial and spatio-temporal prediction with large datasets, J. Stat. Softw., 98, 4, 1-48 (2021) · doi:10.18637/jss.v098.i04
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.