×

Variational supertrees for Bayesian phylogenetics. (English) Zbl 07906401

Summary: Bayesian phylogenetic inference is powerful but computationally intensive. Researchers may find themselves with two phylogenetic posteriors on overlapping data sets and may wish to approximate a combined result without having to re-run potentially expensive Markov chains on the combined data set. This raises the question: given overlapping subsets of a set of taxa (e.g. species or virus samples), and given posterior distributions on phylogenetic tree topologies for each of these taxon sets, how can we optimize a probability distribution on phylogenetic tree topologies for the entire taxon set? In this paper we develop a variational approach to this problem and demonstrate its effectiveness. Specifically, we develop an algorithm to find a suitable support of the variational tree topology distribution on the entire taxon set, as well as a gradient-descent algorithm to minimize the divergence from the restrictions of the variational distribution to each of the given per-subset probability distributions, in an effort to approximate the posterior distribution on the entire taxon set.

MSC:

92D15 Problems related to evolution
62F15 Bayesian inference

Software:

BEAST

References:

[1] Bininda-Emonds ORP (2004) The evolution of supertrees. Trends Ecol Evol 19(6):315-322. doi:10.1016/j.tree.2004.03.015
[2] Bouchard-Côté, A.; Sankararaman, S.; Jordan, MI, Phylogenetic inference via sequential Monte Carlo, Syst Biol, 61, 4, 579-593, 2012 · doi:10.1093/sysbio/syr131
[3] Bryant, D.; Gascuel, O.; Sagot, MF, Optimal agreement supertrees, Computational biology, 24-31, 2001, Berlin: Springer, Berlin · Zbl 0991.92019 · doi:10.1007/3-540-45727-5_3
[4] De Oliveira, ML; Mallo, D.; Posada, D., A Bayesian supertree model for genome-wide species tree reconstruction, Syst Biol, 65, 3, 397-416, 2016 · doi:10.1093/sysbio/syu082
[5] Drummond, AJ; Rambaut, A., BEAST: Bayesian evolutionary analysis by sampling trees, BMC Evol Biol, 7, 1, 1-8, 2007 · doi:10.1186/1471-2148-7-214
[6] Felsenstein J (1986) The Newick tree format. http://evolution.genetics.washington.edu/phylip/newicktree.html
[7] Hastings, WK, Monte Carlo sampling methods using Markov Chains and their applications, Biometrika, 57, 1, 97-109, 1970 · Zbl 0219.65008 · doi:10.2307/2334940
[8] Heled, J.; Drummond, AJ, Bayesian inference of species trees from multilocus data, Mol Biol Evol, 27, 3, 570-580, 2010 · doi:10.1093/molbev/msp274
[9] Höhna, S.; Drummond, AJ, Guided tree topology proposals for Bayesian phylogenetic inference, Syst Biol, 61, 1, 1-11, 2012 · doi:10.1093/sysbio/syr074
[10] Huson, DH; Nettles, SM; Warnow, TJ, Disk-covering, a fast-converging method for phylogenetic tree reconstruction, J Comput Biol, 6, 3-4, 369-386, 1999 · doi:10.1089/106652799318337
[11] Jukes, TH; Cantor, CR, Evolution of protein molecules, Mammalian protein metabolism, 3, 21-132, 1969 · doi:10.1016/B978-1-4832-3211-9.50009-7
[12] Larget, B., The estimation of tree posterior probabilities using conditional clade probability distributions, Syst Biol, 62, 4, 501-511, 2013 · doi:10.1093/sysbio/syt014
[13] Liu, L.; Pearl, DK, Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions, Syst Biol, 56, 3, 504-514, 2007 · doi:10.1080/10635150701429982
[14] Pybus, OG; Drummond, AJ; Nakano, T., The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach, Mol Biol Evol, 20, 3, 381-387, 2003 · doi:10.1093/molbev/msg043
[15] Ronquist F, Huelsenbeck JP, Britton T (2004) Bayesian supertrees. In: Bininda-Emonds ORP (ed) Phylogenetic Supertrees: Combining information to reveal the Tree of Life. Springer Netherlands, Dordrecht, p 193-224, doi:10.1007/978-1-4020-2330-9_10 · Zbl 1060.68083
[16] Sanderson, MJ; Purvis, A.; Henze, C., Phylogenetic supertrees: assembling the trees of life, Trends Ecol Evol, 13, 3, 105-109, 1998 · doi:10.1016/S0169-5347(97)01242-1
[17] Semple, C.; Steel, M., Phylogenetics, 2003, New York: Oxford University Press, New York · Zbl 1043.92026 · doi:10.1093/oso/9780198509424.001.0001
[18] Steel, M., The complexity of reconstructing trees from qualitative characters and subtrees, J Classification, 9, 1, 91-116, 1992 · Zbl 0766.92002 · doi:10.1007/BF02618470
[19] Steel, M.; Rodrigo, A., Maximum likelihood supertrees, Syst Biol, 57, 2, 243-250, 2008 · doi:10.1080/10635150802033014
[20] Suchard, MA; Lemey, P.; Baele, G., Bayesian phylogenetic and phylodynamic data integration using BEAST 110, Virus Evol, 4, 1, vey016, 2018 · doi:10.1093/ve/vey016
[21] Wang, L.; Bouchard-Côté, A.; Doucet, A., Bayesian phylogenetic inference using a combinatorial Sequential Monte Carlo method, J Am Stat Assoc, 110, 512, 1362-1374, 2015 · Zbl 1373.62555 · doi:10.1080/01621459.2015.1054487
[22] Zhang C, Matsen IV FA (2018) Generalizing tree probability estimation via Bayesian networks. In: Bengio S, Wallach H, Larochelle H, et al (eds) Advances in Neural Information Processing Systems 31. Curran Associates, Inc., p 1449-1458, http://papers.nips.cc/paper/7418-generalizing-tree-probability-estimation-via-bayesian-networks.pdf
[23] Zhang C, Matsen IV FA (2019) Variational Bayesian phylogenetic inference. In: International conference on learning representations (ICLR), https://openreview.net/pdf?id=SJVmjjR9FX
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.