×

Score matching for compositional distributions. (English) Zbl 07751810

Summary: Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. With real data, it is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. Major limitations of currently available models for compositional data include one or more of the following: insufficient flexibility in terms of distributional shape; difficulty in accommodating zeros in the data in estimation; and lack of computational viability in moderate to high dimensions. In this article, we propose a new model, the polynomially tilted pairwise interaction (PPI) model, for analysing compositional data. Maximum likelihood estimation is difficult for the PPI model. Instead, we propose novel score matching estimators, which entails extending the score matching approach to Riemannian manifolds with boundary. These new estimators are available in closed form and simulation studies show that they perform well in practice. As our main application, we analyse real microbiome count data with fixed totals using a multinomial latent variable model with a PPI model for the latent variable distribution. We prove that, under certain conditions, the new score matching estimators are consistent for the parameters in the new multinomial latent variable model.

MSC:

62-XX Statistics

References:

[1] Aitchison, J., The Statistical Analysis of Compositional Data, Monographs on Statistics and Applied Probability, 25 (1986), London: Chapman & Hall, London · Zbl 0688.62004
[2] Bear, J.; Billheimer, D., “A Logistic Normal Mixture Model for Compositional Data Allowing Essential Zeros, Austrian Journal of Statistics, 45, 3-23 (2016) · doi:10.17713/ajs.v45i4.117
[3] Billheimer, D.; Guttorp, P.; Fagan, W. F., Statistical interpretation of species composition, Journal of the American Statistical Association, 96, 1205-1214 (2001) · Zbl 1073.62573 · doi:10.1198/016214501753381850
[4] Butler, A.; Glasbey, C., “A Latent Gaussian Model for Compositional Data With Zeros, Applied Statistics, 57, 505-520 (2008) · doi:10.1111/j.1467-9876.2008.00627.x
[5] Hyvarinen, A., “Estimation of Non-Normalised Statistical Models by Score Matching, Journal of Machine Learning Research, 6, 695-709 (2005) · Zbl 1222.62051
[6] Hyvarinen, A., “Some Extensions of Score Matching, Computational Statistics and Data Analysis, 51, 2499-2512 (2007) · Zbl 1161.62326
[7] Krzysztofowicz, R.; Reese, S., “Stochastic Bifurcation Processes and Distributions of Fractions, Journal of the American Statistical Association, 88, 345-354 (1993) · Zbl 0777.62019
[8] Leininger, T. J.; Gelfand, A. E.; Allen, J. M.; Silander Jr., J. A., “Spatial Regression Modeling for Compositional Data With Many Zeros, Journal of Agricultural, Biological, and Environmental Statistics, 18, 314-334 (2013) · Zbl 1303.62085 · doi:10.1007/s13253-013-0145-y
[9] Li, H., “Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis, The Annual Review of Statistics and Its Application, 2, 73-94 (2015) · doi:10.1146/annurev-statistics-010814-020351
[10] Liu, S.; Kanamori, T.; Williams, D. J., “Estimating Density Models with Truncation Boundaries.” (2020)
[11] Mardia, K. V., 2018 21st International Conference on Information Fusion (FUSION), A New Estimation Methodology for Standard Directional Distributions, 724-729 (2018), New York: IEEE, New York
[12] Mardia, K. V.; Kent, J. T.; Laha, A. K., “Score Matching Estimators for Directional Distributions.” (2016)
[13] Martin, I.; Uh, H-W.; Supali, T.; Mitreva, M.; Houwing-Duistermaat, J. J., “The Mixed Model for the Analysis of a Repeated-Measurement Multivariate Count Data, Statistics in Medicine, 38, 2248-2268 (2018) · doi:10.1002/sim.8101
[14] Ongaro, A.; Migliorati, S.; Ascari, R., “A New Mixture Model on the Simplex, Statistics and Computing, 30, 749-770 (2020) · Zbl 1447.62133 · doi:10.1007/s11222-019-09920-x
[15] Scealy, J. L.; Welsh, A. H., “Regression for Compositional Data by Using Distributions Defined on the Hypersphere, Journal of the Royal Statistical Society, Series B, 73, 351-375 (2011) · Zbl 1411.62179 · doi:10.1111/j.1467-9868.2010.00766.x
[16] Scealy, J. L.; Welsh, A. H., “Fitting Kent Models to Compositional Data With Small Concentration, Statistics and Computing, 24, 165-179 (2014) · Zbl 1325.62049
[17] Scealy, J. L.; Welsh, A. H., “A Directional Mixed Effects Model for Compositional Expenditure Data, Journal of the American Statistical Association, 112, 24-36 (2017)
[18] Stewart, C.; Field, C. A., “Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis, Journal of Agricultural, Biological, and Environmental Statistics, 16, 45-69 (2010) · Zbl 1306.62237 · doi:10.1007/s13253-010-0040-8
[19] Takasu, Y.; Yano, K.; Komaki, F., “Scoring Rules for Statistical Models on Spheres, Statistics and Probability Letters, 138, 111-115 (2018) · Zbl 1463.62158 · doi:10.1016/j.spl.2018.02.054
[20] Tsagris, M.; Stewart, C., “A Folded Model for Compositional Data Analysis, Australian and New Zealand Journal of Statistics, 62, 249-277 (2020) · Zbl 1521.62081 · doi:10.1111/anzs.12289
[21] Yu, S.; Drton, M.; Shojaie, A., “Generalised Score Matching for Non-Negative Data,”, Journal of Machine Learning Research, 20, 1-70 (2019) · Zbl 1489.62082
[22] Yu, S.; Drton, M.; Shojaie, A., “Generalised Score Matching for General Domains.” (2020)
[23] Zhang, J.; Lin, W., “Scalable Estimation and Regularization for the Logistic Normal Multinomial Model, Biometrics, 75, 1098-1108 (2019) · Zbl 1448.62197 · doi:10.1111/biom.13071
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.