×

Gradient boosting with extreme-value theory for wildfire prediction. (English) Zbl 07685221

Summary: This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking.

MSC:

62G32 Statistics of extreme values; tail inference
62J99 Linear inference, regression
62P12 Applications of statistics to environmental and related topics

References:

[1] Breiman, L., Random forests, Mach. Learn., 45, 1, 5-32 (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[2] Breiman, L.; Friedman, JH; Olshen, RA; Stone, CJ, Classification and Regression Trees (1984), Monterey, CA: Wadsworth and Brooks, Monterey, CA · Zbl 0541.62042
[3] Brillinger, DR; Preisler, HK; Benoit, JW, Probabilistic risk assessment for wildfires, Environmetrics, 17, 6, 623-633 (2006) · doi:10.1002/env.768
[4] Bühlmann, P.; Hothorn, T., Boosting algorithms: regularization, prediction and model fitting, Stat. Sci., 22, 4, 477-505 (2007) · Zbl 1246.62163
[5] Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 785-794. ACM, New York, NY, USA (2016)
[6] Cox, DR, The regression analysis of binary sequences (with discussion), J. Roy. Stat. Soc.: Ser. B (Methodol.), 20, 2, 215-232 (1958) · Zbl 0088.35703
[7] Cui, W.; Perera, AH, What do we know about forest fire size distribution, and why is this knowledge useful for forest management?, Int. J. Wildland Fire, 17, 2, 234-244 (2008) · doi:10.1071/WF06145
[8] Cumming, S., A parametric model of the fire-size distribution, Can. J. For. Res., 31, 8, 1297-1303 (2001) · doi:10.1139/x01-032
[9] Davison, AC; Smith, RL, Models for exceedances over high thresholds (with discussion), J. Roy. Stat. Soc.: Ser. B (Methodol.), 52, 3, 393-442 (1990) · Zbl 0706.62039
[10] De Angelis, A.; Ricotta, C.; Conedera, M.; Pezzatti, GB, Modelling the meteorological forest fire niche in heterogeneous pyrologic conditions, PLoS ONE, 10, 2, 1-17 (2015) · doi:10.1371/journal.pone.0116875
[11] De Zea Bermudez, P.; Mendes, J.; Pereira, JM; Turkman, KF; Vasconcelos, MJ, Spatial and temporal extremes of wildfire sizes in Portugal (1984-2004), Int. J. Wildland Fire, 18, 8, 983-991 (2009) · doi:10.1071/WF07044
[12] Diggle, PJ; Menezes, R.; Su, T-L, Geostatistical inference under preferential sampling (with discussion), J. Roy. Stat. Soc.: Ser. C (Appl. Stat.), 59, 2, 191-232 (2010)
[13] Dutta, R.; Aryal, J.; Das, A.; Kirkpatrick, JB, Deep cognitive imaging systems enable estimation of continental-scale fire incidence from climate data, Sci. Rep., 3, 1, 3188 (2013) · doi:10.1038/srep03188
[14] Friedman, JH, Greedy function approximation: a gradient boosting machine, Ann. Stat., 29, 1189-1232 (2001) · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[15] Friedman, J.; Hastie, T.; Tibshirani, R., Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors), Ann. Stat., 28, 2, 337-407 (2000) · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[16] Fuglstad, G-A; Simpson, D.; Lindgren, F.; Rue, H., Constructing priors that penalize the complexity of Gaussian random fields, J. Am. Stat. Assoc., 114, 525, 445-452 (2018) · Zbl 1478.62279 · doi:10.1080/01621459.2017.1415907
[17] Genton, MG; Butry, DT; Gumpertz, ML; Prestemon, JP, Spatio-temporal analysis of wildfire ignitions in the St Johns River water management district, Florida, Int. J. Wildland Fire, 15, 1, 87-97 (2006) · doi:10.1071/WF04034
[18] Gneiting, T.; Ranjan, R., Comparing density forecasts using threshold- and quantile-weighted scoring rules, J. Bus. Econ. Stat., 29, 3, 411-422 (2011) · Zbl 1219.91108 · doi:10.1198/jbes.2010.08110
[19] Greenwell, B.; Boehmke, B.; Cunningham, J.; Developers, G., GBM: generalized boosted regression models, R Package Version, 2, 1, 8 (2020)
[20] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning: Data Mining, Inference and Prediction (2009), Springer · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[21] Hitz, A., Davis, R., Samorodnitsky, G.: ‘Discrete extremes’. Preprint https://arxiv.org/abs/1707.05033 (2017)
[22] Jain, P.; Coogan, SC; Subramanian, SG; Crowley, M.; Taylor, S.; Flannigan, MD, A review of machine learning applications in wildfire science and management, Environ. Rev., 28, 4, 478-505 (2020) · doi:10.1139/er-2020-0019
[23] Joseph, M.B., Rossi, M.W., Mietkiewicz, N.P., Mahood, A.L., Cattau, M.E., St. Denis, L.A., Nagy, R.C., Iglesias, V., Abatzoglou, J.T., Balch, J.K.: Spatiotemporal prediction of wildfire size extremes with Bayesian finite sample maxima. Ecol. Appl. 29(6), e01898 (2019)
[24] Koh, J., Pimont, F., Dupuy, J.-L., Opitz, T.: Spatiotemporal wildfire modeling through point processes with moderate and extreme marks. Ann Appl Stat. 17(1), 560-582 (2023) · Zbl 07656989
[25] Liang, H.; Zhang, M.; Wang, H., A neural network model for wildfire scale prediction using meteorological factors, IEEE Access, 7, 176746-176755 (2019) · doi:10.1109/ACCESS.2019.2957837
[26] Lindgren, F.; Rue, H.; Lindström, J., An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach (with discussion), J. R. Stat. Soc. Series B. Stat. Methodol., 73, 4, 423-498 (2011) · Zbl 1274.62360 · doi:10.1111/j.1467-9868.2011.00777.x
[27] Matheson, JE; Winkler, RL, Scoring rules for continuous probability distributions, Manage. Sci., 22, 10, 1087-1096 (1976) · Zbl 0349.62080 · doi:10.1287/mnsc.22.10.1087
[28] Mitsopoulos, I.; Mallinis, G., A data-driven approach to assess large fire size generation in Greece, Nat. Hazards, 88, 3, 1591-1607 (2017) · doi:10.1007/s11069-017-2934-z
[29] National Interagency Fire Center: ‘Total wildfires and acres’ (2021). Data retrieved from https://www.predictiveservices.nifc.gov/intelligence/intelligence.htm. Accessed 17 Jun 2021
[30] Nelder, JA; Wedderburn, RWM, ‘Generalized linear models’., J. R. Stat. Soc. Ser. A Stat. Soc., 135, 3, 370-384 (1972) · doi:10.2307/2344614
[31] Opitz, T.: ‘EVA 2021 data challenge’ (2021). https://www.maths.ed.ac.uk/school-of-mathematics/eva-2021/competitions/data-challenge. Accessed 17 Jun 2021
[32] Opitz, T.: ‘Editorial: EVA 2021 data competition on spatio-temporal prediction of wildfire activity in the United States’. Extremes to appear (2022)
[33] Opitz, T., Bonneu, F., Gabriel, E.: Point-process based modeling of space-time structures of forest fire occurrences in Mediterranean France. Spat. Stat. 40, 100429 (2020)
[34] Opitz, T.; Huser, R.; Bakka, H.; Rue, H., INLA goes extreme: Bayesian tail regression for the estimation of high spatio-temporal quantiles, Extremes, 21, 3, 441-462 (2018) · Zbl 1407.62167 · doi:10.1007/s10687-018-0324-x
[35] Pati, D.; Reich, BJ; Dunson, DB, Bayesian geostatistical modelling with informative sampling locations, Biometrika, 98, 1, 35-48 (2011) · Zbl 1214.62029 · doi:10.1093/biomet/asq067
[36] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, Édouard, Scikit-learn: machine learning in Python, J. Mach. Learn. Res., 12, 85, 2825-2830 (2011) · Zbl 1280.68189
[37] Peng, RD; Schoenberg, FP; Woods, JA, A space-time conditional intensity model for evaluating a wildfire hazard index, J. Am. Stat. Assoc., 100, 469, 26-35 (2005) · Zbl 1117.62411 · doi:10.1198/016214504000001763
[38] Pereira, J.M.C., Turkman, K.F.: Statistical models of vegetation fires: spatial and temporal patterns. In: Handbook of Environmental and Ecological Statistics, pp. 401-420. Chapman and Hall/CRC (2019)
[39] Pimont, F., Fargeon, H., Opitz, T., Ruffault, J., Barbero, R., Martin-StPaul, N., Rigolot, E. I., Rivière, M., Dupuy, J.-L.: Prediction of regional wildfire activity in the probabilistic Bayesian framework of Firelihood. Ecol. Appl. e02316 (2021)
[40] Pohjankukka, J.; Pahikkala, T.; Nevalainen, P.; Heikkonen, J., Estimating the prediction performance of spatial models via spatial k-fold cross validation, Int. J. Geogr. Inf. Sci., 31, 10, 2001-2019 (2017) · doi:10.1080/13658816.2017.1346255
[41] Preisler, HK; Brillinger, DR; Burgan, RE; Benoit, J., Probability based models for estimation of wildfire risk, Int. J. Wildland Fire, 13, 2, 133-142 (2004) · doi:10.1071/WF02061
[42] Prieto, F.; Gómez-Déniz, E.; Sarabia, JM, Modelling road accident blackspots data with the discrete generalized Pareto distribution, Accid. Anal. Prev., 71, 38-49 (2014) · doi:10.1016/j.aap.2014.05.005
[43] Rasmussen, CE; Williams, CKI, Gaussian Processes for Machine Learning (2005), The MIT Press · doi:10.7551/mitpress/3206.001.0001
[44] Roberts, DR; Bahn, V.; Ciuti, S.; Boyce, MS; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, JJ; Schröder, B.; Thuiller, W.; Warton, DI; Wintle, BA; Hartig, F.; Dormann, CF, Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography, 40, 8, 913-929 (2017) · doi:10.1111/ecog.02881
[45] Rue, H.; Martino, S.; Chopin, N., Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion), J. R. Stat. Soc. Series B Stat. Methodol., 71, 2, 319-392 (2009) · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[46] Sakr, G.E., Elhajj, I.H., Mitri, G., Wejinya, U.C.: Artificial intelligence for forest fire prediction. In: 2010 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp. 1311-1316. (2010)
[47] Shidik, G.F., Mustofa, K.: Predicting size of forest fire using hybrid model. In: Linawati, Mahendra, M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds.) Information and Communication Technology, pp. 316-327. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
[48] Shimura, T., Discretization of distributions in the maximum domain of attraction, Extremes, 15, 3, 299-317 (2012) · Zbl 1329.60159 · doi:10.1007/s10687-011-0137-7
[49] Simpson, D.; Rue, H.; Riebler, A.; Martins, TG; Sørbye, SH, Penalising model component complexity: A principled, practical approach to constructing priors, Stat. Sci., 32, 1, 1-28 (2017) · Zbl 1442.62060 · doi:10.1214/16-STS576
[50] Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25. Curran Associates, Inc (2012)
[51] Stewart, SI; Radeloff, VC; Hammer, RB; Hawbaker, TJ, Defining the Wildland-Urban Interface, J. Forest., 105, 4, 201-207 (2007)
[52] Taylor, SW; Woolford, DG; Dean, CB; Martell, DL, Wildfire prediction to inform fire management: Statistical science challenges, Stat. Sci., 28, 4, 586-615 (2013) · Zbl 1331.86029 · doi:10.1214/13-STS451
[53] Tonini, M.; Pereira, MG; Parente, J.; Orozco, CV, Evolution of forest fires in Portugal: from spatio-temporal point events to smoothed density maps, Nat. Hazards, 85, 3, 1489-1510 (2017) · doi:10.1007/s11069-016-2637-x
[54] Turkman, KF; Amaral Turkman, MA; Pereira, JM, Asymptotic models and inference for extremes of spatio-temporal data, Extremes, 13, 4, 375-397 (2010) · Zbl 1226.60083 · doi:10.1007/s10687-009-0092-8
[55] van Wagner, C., Conditions for the start and spread of crown fire, Can. J. For. Res., 7, 1, 23-34 (1977) · doi:10.1139/x77-004
[56] Velthoen, J., Dombry, C., Cai, J.-J., Engelke, S.: Gradient boosting for extreme quantile regression. Preprint at https://arxiv.org/abs/2103.00808 (2021)
[57] Vilar, L.; Woolford, DG; Martell, DL; Martín, MP, Spatio-temporal analysis of wildfire ignitions in the St Johns River water management district, Florida, Int. J. Wildland Fire, 19, 3, 325-337 (2010) · doi:10.1071/WF09030
[58] Wood, S.: Generalized Additive Models: an Introduction with R, 2nd edn. Chapman and Hall/CRC (2017) · Zbl 1368.62004
[59] Woolford, DG; Bellhouse, DR; Braun, WJ; Dean, CB; Martell, DL; Sun, J., A spatio-temporal model for people-caused forest fire occurrence in the Romeo Malette forest, J. Environ. Stat., 2, 1, 1-26 (2011)
[60] Xi, DD; Taylor, SW; Woolford, DG; Dean, C., Statistical models of key components of wildfire risk, Annu. Rev. Stat. Appl., 6, 197-222 (2019) · doi:10.1146/annurev-statistics-031017-100450
[61] Xie, Y.; Peng, M., Forest fire forecasting using ensemble learning approaches, Neural Comput. Appl., 31, 9, 4541-4550 (2019) · doi:10.1007/s00521-018-3515-0
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.