Document Zbl 1534.91136

Wang, Jun; Wen, Lihong; Xiao, Lu; Wang, Chaojie

Time-series forecasting of mortality rates using transformer. (English) Zbl 1534.91136

Scand. Actuar. J. 2024, No. 2, 109-123 (2024).

Summary: Predicting mortality rates is a crucial issue in life insurance pricing and demographic statistics. Traditional approaches, such as the Lee-Carter model and its variants, predict the trends of mortality rates using factor models, which explain the variations of mortality rates from the perspective of ages, gender, regions, and other factors. Recently, deep learning techniques have achieved great success in various tasks and shown strong potential for time-series forecasting. In this paper, we propose a modified Transformer architecture for predicting mortality rates in major countries around the world. Through the multi-head attention mechanism and positional encoding, the proposed Transformer model extracts key features effectively and thus achieves better performance in time-series forecasting. By using empirical data from the Human Mortality Database, we demonstrate that our Transformer model has higher prediction accuracy of mortality rates than the Lee-Carter model and other classic neural networks. Our model provides a powerful forecasting tool for insurance companies and policy makers.

MSC:

91G05	Actuarial mathematics
91D20	Mathematical geography and demography
68T07	Artificial neural networks and deep learning

Keywords:

mortality rates; time series forecasting; deep learning; transformer

Software:

AlphaFold; BERT; word2vec; AlexNet; Tensor2Tensor; ImageNet

Cite Review PDF

Full Text: DOI

References:

[1]	Bahdanau, D., Cho, K. & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[2]	Booth, H., Maindonald, J. & Smith, L. (2002). Applying Lee-Carter under conditions of variable mortality decline. Population Studies56(3), 325-336.
[3]	Brouhns, N., Denuit, M. & Vermunt, J. K. (2002). A Poisson log-bilinear regression approach to the construction of projected lifetables. Insurance: Mathematics and Economics31(3), 373-393. · Zbl 1074.62524
[4]	Cairns, A. J., Blake, D. & Dowd, K. (2006). A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Journal of Risk and Insurance73(4), 687-718.
[5]	Cairns, A. J., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A. & Balevich, I. (2009). A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North American Actuarial Journal13(1), 1-35. · Zbl 1484.91376
[6]	Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[7]	Egmont-Petersen, M., de Ridder, D. & Handels, H. (2002). Image processing with neural networks-a review. Pattern Recognition35(10), 2279-2301. · Zbl 1006.68884
[8]	He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. P. 770-778. The Conference (CVPR 2016) was held in Las Vegas, USA
[9]	Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
[10]	Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation9(8), 1735-1780.
[11]	Huang, Y., Shen, L. & Liu, H. (2019). Grey relational analysis, principal component analysis and forecasting of carbon emissions based on long short-term memory in China. Journal of Cleaner Production209, 415-423.
[12]	Hüsken, M. & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing50, 223-235. · Zbl 1006.68822
[13]	Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A. & Bridgland, A. (2021). Highly accurate protein structure prediction with AlphaFold. Nature596(7873), 583-589.
[14]	Kingma, D. P. & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[15]	Klenk, J., Keil, U., Jaensch, A., Christiansen, M. C. & Nagel, G. (2016). Changes in life expectancy 1950-2010: contributions from age-and disease-specific mortality in selected countries. Population Health Metrics14(1), 1-11.
[16]	Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM60(6), 84-90.
[17]	Lee, R. D. & Carter, L. R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association87(419), 659-671. · Zbl 1351.62186
[18]	Li, H. & Lu, Y. (2017). Coherent forecasting of mortality rates: A sparse vector-autoregression approach. ASTIN Bulletin: The Journal of the IAA47(2), 563-600. · Zbl 1390.62215
[19]	Li, J. & Wong, K. (2020). Incorporating structural changes in mortality improvements for mortality forecasting. Scandinavian Actuarial Journal2020(9), 776-791. · Zbl 1454.91198
[20]	Lim, B. & Zohren, S. (2021). Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A379(2194), 20200209.
[21]	Lindholm, M. & Palmborg, L. (2022). Efficient use of data for LSTM mortality forecasting. European Actuarial Journal12, 749-778. · Zbl 1505.91334
[22]	Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[23]	Mnih, V., Heess, N., Graves, A. & Kavukcuoglu, K. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems27. 2204 to 2212.
[24]	Nigri, A., Levantesi, S., Marino, M., Scognamiglio, S. & Perla, F. (2019). A deep learning integrated Lee-Carter model. Risks7(1), 33.
[25]	Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A. & Tran, D. (2018). Image transformer. In International Conference on Machine Learning. P. 4055-4064.The Conference (ICML 2018) was held in Stockholm, Sweden
[26]	Perla, F., Richman, R., Scognamiglio, S. & Wüthrich, M. V. (2021). Time-series forecasting of mortality rates using deep learning. Scandinavian Actuarial Journal2021(7), 572-598. · Zbl 1471.91480
[27]	Perla, F. & Scognamiglio, S. (2023). Locally-coherent multi-population mortality modelling via neural networks. Decisions in Economics and Finance46, 157-176. · Zbl 1518.91200
[28]	Pitacco, E., Denuit, M., Haberman, S. & Olivieri, A. (2009). Modelling longevity dynamics for pensions and annuity business. Oxford University Press. · Zbl 1163.91005
[29]	Richman, R. & Wüthrich, M. V. (2021). A neural network extension of the Lee-Carter model to multiple populations. Annals of Actuarial Science15(2), 346-366.
[30]	Schnürch, S. & Korn, R. (2022). Point and interval forecasts of death rates using neural networks. ASTIN Bulletin: The Journal of the IAA52(1), 333-360. · Zbl 1484.91404
[31]	Scognamiglio, S. (2022). Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural networks. ASTIN Bulletin: The Journal of the IAA52(2), 519-561. · Zbl 1492.91314
[32]	Shi, Y. (2021). Forecasting mortality rates with the adaptive spatial temporal autoregressive model. Journal of Forecasting40(3), 528-546.
[33]	Tang, W., Long, G., Liu, L., Zhou, T., Blumenstein, M. & Jiang, J. (2022). Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification. In International Conference on Learning Representations. Chicago, USA.
[34]	Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. (2020). State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nature Communications11(1), 1-11.
[35]	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems30, 5998-6008.
[36]	Wang, C., Chen, Y., Zhang, S. & Zhang, Q. (2022). Stock market index prediction using deep Transformer model. Expert Systems with Applications208, 118128.
[37]	Wang, C.-W., Zhang, J. & Zhu, W. (2021). Neighbouring prediction for mortality. ASTIN Bulletin: The Journal of the IAA51(3), 689-718. · Zbl 1480.91248
[38]	Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J. & Sun, L. (2022). Transformers in time series: A survey. arXiv preprint arXiv:2202.07125.
[39]	Xi, E., Bing, S. & Jin, Y. (2017). Capsule network performance on complex data. arXiv preprint arXiv:1712.03480.
[40]	Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. & Zhang, W. (2021). Informer: beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. P. 11106-11115. The Conference (AAAI 2021) was held in Vancouver, Canada.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.