×

Adaptive multiple importance sampling for Gaussian processes. (English) Zbl 07192021

Summary: In applications of Gaussian processes (GPs) where quantification of uncertainty is a strict requirement, it is necessary to accurately characterize the posterior distribution over Gaussian process covariance parameters. This is normally done by means of standard Markov chain Monte Carlo (MCMC) algorithms, which require repeated expensive calculations involving the marginal likelihood. Motivated by the desire to avoid the inefficiencies of MCMC algorithms rejecting a considerable amount of expensive proposals, this paper develops an alternative inference framework based on adaptive multiple importance sampling (AMIS). In particular, this paper studies the application of AMIS for GPs in the case of a Gaussian likelihood, and proposes a novel pseudo-marginal-based AMIS algorithm for non-Gaussian likelihoods, where the marginal likelihood is unbiasedly estimated. The results suggest that the proposed framework outperforms MCMC-based inference of covariance parameters in a wide range of scenarios.

MSC:

62-XX Statistics

Software:

PRMLT; NUTS; EGO; UCI-ml

References:

[1] Rasmussen CE, Williams C. Gaussian processes for machine learning. Cambridge (MA): MIT Press; 2006. [Crossref], [Google Scholar] · Zbl 1177.68165
[2] Bishop CM. Pattern recognition and machine learning (information science and statistics). Secaucus (NJ): Springer-Verlag New York, Inc.; 2006. [Google Scholar] · Zbl 1107.68072
[3] Filippone M, Girolami M. Pseudo-marginal Bayesian inference for Gaussian processes. IEEE Trans Pattern Anal Mach Intell. 2014;36(11):2214-2226. doi: 10.1109/TPAMI.2014.2316530[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[4] Filippone M, Marquand AF, Blain CRV, et al. Probabilistic prediction of neurological disorders with a statistical assessment of neuroimaging data modalities. Ann Appl Stat. 2012;6(4):1883-1905. doi: 10.1214/12-AOAS562[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1257.62103
[5] Kim S, Valente F, Filippone M, et al. Predictioning continuous conflict perception with Bayesian Gaussian processes. IEEE Trans Affect Comput. 2014;5(2):187-200. doi: 10.1109/TAFFC.2014.2324564[Crossref], [Web of Science ®], [Google Scholar]
[6] Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive Black-Box functions. J Global Optim. 1998;13(4):455-492. doi: 10.1023/A:1008306431147[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0917.90270
[7] Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc Ser B Stat Methodol. 2001;63(3):425-464. doi: 10.1111/1467-9868.00294[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1007.62021
[8] Neal RM. Regression and classification using Gaussian process priors (with discussion). Bayesian Stat. 1999;6:475-501. [Google Scholar] · Zbl 0974.62072
[9] Taylor MB, Diggle JP. INLA or MCMC? a tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes; 2012. arXiv:1202.1738 [Google Scholar] · Zbl 1453.62214
[10] Williams CKI, Barber D. Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell. 1998;20:1342-1351. doi: 10.1109/34.735807[Crossref], [Web of Science ®], [Google Scholar]
[11] Opper M, Winther O. Gaussian processes for classification: mean-field algorithms. Neural Comput. 2000;12(11):2655-2684. doi: 10.1162/089976600300014881[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[12] Kuss M, Rasmussen CE. Assessing approximate inference for binary Gaussian process classification. J Mach Learn Res. 2005;6:1679-1704. [Web of Science ®], [Google Scholar] · Zbl 1190.62119
[13] Nickisch H, Rasmussen CE. Approximations for binary Gaussian process classification. J Mach Learn Res. 2008;9:2035-2078. [Web of Science ®], [Google Scholar] · Zbl 1225.62087
[14] Hensman J, Alexander G, Filippone M, et al. MCMC for variationally sparse Gaussian processes; 2015. Report no.: arXiv:1506.04000. [Google Scholar]
[15] Murray I, Adams RP. Slice sampling covariance hyperparameters of latent Gaussian models. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, editors. Advances in neural information processing systems 23: 24th annual conference on neural information processing systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada: Curran Associates, Inc.; 2010. p. 1732-1740. [Google Scholar]
[16] Vanhatalo J, Vehtari A. Sparse log Gaussian processes via MCMC for spatial epidemiology. J Mach Learn Res. 2007;1:73-89. [Google Scholar]
[17] Filippone M, Zhong M, Girolami M. A comparative evaluation of stochastic-based inference methods for Gaussian process models. Mach Learn. 2013;93(1):93-114. doi: 10.1007/s10994-013-5388-x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1294.62048
[18] Filippone M. Bayesian inference for Gaussian process classifiers with annealing and pseudo-marginal MCMC. In: 22nd international conference on pattern recognition, ICPR 2014, Stockholm, Sweden, August 24-28, 2014. IEEE; 2014. p. 614-619. [Google Scholar]
[19] Murray I, Graham MM. Pseudo-marginal slice sampling. eprint arXiv:151002958v1; 2015. [Google Scholar]
[20] Roberts GO, Gelman A, Gilks WR. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl Probab. 1997;7:110-120. doi: 10.1214/aoap/1034625254[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0876.60015
[21] Beskos A, Pillai N, Roberts GO, et al. Optimal tuning of hybrid Monte Carlo algorithm. Bernoulli. 2013;19:1501-1534. doi: 10.3150/12-BEJ414[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1287.60090
[22] Neal RM. Handbook of Markov chain Monte Carlo, chapter 5: MCMC using Hamitonian dynamics. Boca Raton (FL): CRC Press; 2011. [Google Scholar]
[23] Andrieu C, Robert CP. Controlled MCMC for optimal sampling. Bernoulli. 2001;9:395-422. [Google Scholar]
[24] Cornuet JM, Marin JM, Mira A, et al. Adaptive multiple importance sampling. Scand J Statist. 2012;39:798-812. doi: 10.1111/j.1467-9469.2011.00756.x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1319.62059
[25] Andrieu C, Roberts GO. The pseudo-marginal approach for efficient Monte Carlo computations. Ann Statist. 2009;37(2):697-725. doi: 10.1214/07-AOS574[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1185.60083
[26] Pitt M, Silva R, Giordani P, et al. On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J Econometrics. 2012;171:134-151. doi: 10.1016/j.jeconom.2012.06.004[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1443.62499
[27] Tran MN, Scharth M, Pitt M, et al. Importance sampling squared for Bayesian inference in latent variable models. eprint arXiv:13093339; 2014. [Google Scholar]
[28] MacKay DJ. Bayesian non-linear modelling for the prediction competition. In: ASHRAE transactions, V.100, Pt.2. ASHRAE; 1994. p. 1053-1062. [Google Scholar]
[29] Neal RM. Probabilistic inference using Markov chain Monte Carlo methods. Dept. of Computer Science, University of Toronto; 1993. Report No.: CRG-TR-93-1. [Google Scholar]
[30] Papaspiliopoulos O, Roberts GO, Sköld M. A general framework for the parametrization of hierarchical models. Statist Sci. 2007;22(1):59-73. Available from: http://dx.doi.org/10.1214/088342307000000014 doi: 10.1214/088342307000000014[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1246.62195 · doi:10.1214/088342307000000014
[31] Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statist Sci. 1992;7(4):457-472. doi: 10.1214/ss/1177011136[Crossref], [Google Scholar] · Zbl 1386.65060
[32] Flegal JM, Haran M, Jones GL. Markov Chain Monte Carlo: can we trust the third significant figure. Statist Sci. 2007;23(2):250-260. doi: 10.1214/08-STS257[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1327.62017
[33] Haario H, Saksman E, Tamminen J. Adaptive proposal distribution for random walk Metropolis algorithm. Comput Statist. 1999;14:375-395. doi: 10.1007/s001800050022[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0941.62036
[34] Haario H, Saksman E, Tamminen J. An adaptive Metropolis algorithm. Bernoulli. 2001;7:223-242. doi: 10.2307/3318737[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0989.65004
[35] Cappe O, Guillin A, Marin JM, et al. Population Monte Carlo. J Comput Graph Statist. 2004;13:907-929. doi: 10.1198/106186004X12803[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[36] Doucet A, deFreitas N, Gordon N. Sequential MCMC in practice. New York: Springer-Verlag; 2001. [Crossref], [Google Scholar]
[37] Rubin D. Using the SIR algorithm to simulate posterior distributions. In: Bernardo JM, DeGroot MH, Lindley DV, Smith AFM, editors. Bayesian statistics 3. Oxford: Oxford University Press; 1988. p. 395-402. [Google Scholar] · Zbl 0713.62035
[38] Douc R, Guillin A, Marin JM, et al. Convergence of adaptive mixtures of importance sampling schemes. Ann Statist. 2007a;35:420-448. doi: 10.1214/009053606000001154[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1132.60022
[39] Douc R, Guillin A, Marin JM, et al. Minimum variance importance sampling via population Monte Carlo. ESAIM Probab Stat. 2007b;11:427-447. doi: 10.1051/ps:2007028[Crossref], [Google Scholar] · Zbl 1181.60028
[40] Oh MS, Berger JO. Adaptive importance sampling in Monte Carlo integration. J Stat Comput Simul. 1992;41:143-168. doi: 10.1080/00949659208810398[Taylor & Francis Online], [Google Scholar] · Zbl 0781.65016
[41] Owen A, Zhou Y. Safe and effective importance sampling. J Amer Statist Assoc. 2000;95:135-143. doi: 10.1080/01621459.2000.10473909[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 0998.65003
[42] Ortiz L, Kaelbling L. Adaptive importance sampling for estimation in structured domains. In: Proceedings of the sixteenth annual conference on uncertainty in artificial intelligence (UAI-2000). San Francisco, CA.; 2000. p. 446-454. [Google Scholar]
[43] Marin JM, Pudlo P, Sedki M. Consistency of the adaptive multiple importance sampling. eprint arXiv:12112548v2; 2014. [Google Scholar] · Zbl 1466.62157
[44] Metropolis N, Rosenbluth AW, Rosenbluth MN, et al. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21(6):1087-1092. doi: 10.1063/1.1699114[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1431.65006
[45] Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97-109. doi: 10.1093/biomet/57.1.97[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0219.65008
[46] Duane S, Kennedy AD, Pendleton BJ, et al. Hybrid Monte Carlo. Phys Lett B. 1987;195(2):216-222. doi: 10.1016/0370-2693(87)91197-X[Crossref], [Web of Science ®], [Google Scholar]
[47] Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(1):1593-1623. [Google Scholar] · Zbl 1319.60150
[48] Neal RM. Slice sampling. Ann Statist. 2003;31:705-767. doi: 10.1214/aos/1056562461[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1051.65007
[49] Asuncion A, Newman DJ. UCI machine learning repository; 2007. Available from: https://archive.ics.uci.edu/ml/datasets.htmlhttps://archive.ics.uci.edu/ml/datasets.html. [Google Scholar]
[50] Šmídl V, Hofman R. Efficient sequential Monte Carlo sampling for continuous monitoring of a radiation situation. Technometrics. 2014;56(4):514-528. doi: 10.1080/00401706.2013.860917[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[51] Kulhavý R. Recursive nonlinear estimation: a geometric approach. Berlin: Springer-Verlag GmbH; 1996. [Google Scholar] · Zbl 0860.62002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.