×

Compositional inverse Gaussian models with applications in compositional data analysis with possible zero observations. (English) Zbl 07862332

Summary: Compositional data (CoDa) often appear in various fields such as biology, medicine, geology, chemistry, economics, ecology and sociology. Although existing Dirichlet and related models are frequently employed in CoDa analysis, sometimes they may provide unsatisfactory performances in modelling CoDa as shown in our first real data example. First, this paper develops a multivariate compositional inverse Gaussian (CIG) model as a new tool for analysing CoDa. By incorporating the stochastic representation (SR), the expectation-maximization (EM) algorithm (aided by a one-step gradient descent algorithm) can be established to solve the parameter estimation for the proposed distribution (model). Next, zero observations may be often encountered in the real CoDa analysis. Therefore, the second aim of this paper is to propose a new model (called as ZCIG model) through a novel mixture SR based on both the CIG random vector and a so-called zero-truncated product Bernoulli random vector to model CoDa with zeros. Corresponding statistical inference methods are also developed for both cases without/with covariates. Two real data sets are analysed to illustrate the proposed statistical methods by comparing the proposed CIG and ZCIG models with existing Dirichlet and logistic-normal models.

MSC:

62F10 Point estimation
62H12 Estimation in multivariate analysis

Software:

DirichletReg
Full Text: DOI

References:

[1] Aitchison, J.The statistical analysis of compositional data: monographs on statistics and applied probability. London: Chapman and Hall; 1986. · Zbl 0688.62004
[2] Hijazi, RH. Analysis of compositional data using Dirichlet covariate models [PhD thesis]. Washington (DC): American University; 2003.
[3] Zhang, B. On compositional data modeling and its biomedical applications [PhD thesis]. New York: Columbia University; 2013.
[4] Connor, RJ, Mosimann, JE.Concepts of independence for proportions with a generalization of the Dirichlet distribution. J Am Stat Assoc. 1969;64(325):194-206. doi: · Zbl 0179.24101
[5] Campbell, G, Mosimann, JE.Multivariate methods for proportional shape. ASA Proc Sect Stat Graph. 1987;1:10-17.
[6] Gueorguieva, R, Rosenheck, R, Zelterman, D.Dirichlet component regression and its applications to psychiatric data. Comput Stat Data Anal. 2008;52(12):5344-5355. doi: · Zbl 1452.62066
[7] Morais, J, Thomas-Agnan, C, Simioni, M.Using compositional and Dirichlet models for market share regression. J Appl Stat. 2018;45(9):1670-1689. doi: · Zbl 1516.62493
[8] Hijazi, RH, Jernigan, RW.Modelling compositional data using Dirichlet regression models. J Appl Probab Stat. 2009;4(1):77-91. · Zbl 1166.62053
[9] Tweedie, MCK.Statistical properties of inverse gaussian distributions. i. Ann Math Stat. 1957;28(2):362-377. doi: · Zbl 0086.35202
[10] Maier, MJ.DirichletReg: Dirichlet regression for compositional data in r. Vienna: WU Vienna University of Economics and Business; 2014. (Research Report Series / Department of Statistics and Mathematics 125).
[11] Bacon-Shone, J.Ranking methods for compositional data. J R Stat Soc Ser C-Appl Stat. 1992;41(3):533-537. doi: · Zbl 0825.62387
[12] Fry, JM, Fry, TRL, McLaren, KR.Compositional data analysis and zeros in micro data. Appl Econ. 2000;32(8):953-959. doi:.
[13] Aitchison, J, Kay, J. Possible solutions of some essential zero problems in compositional data analysis. In: Thió-Henestrosa S, Martín-Fernández JA, editors. Proceedings of CoDaWork’03: The 1st Compositional Data Analysis Workshop. University of Girona; 2003.
[14] Scealy, JL, Welsh, AH.Regression for compositional data by using distributions defined on the hypersphere. J R Stat Soc Ser B-Stat Methodol. 2011;73(3):351-375. doi: · Zbl 1411.62179
[15] Lijoi, A, Mena, RH, Prünster, I.Hierarchical mixture modeling with normalized inverse-Gaussian priors. J Am Stat Assoc. 2005;100(472):1278-1291. doi: · Zbl 1117.62386
[16] Lange, K.A gradient algorithm locally equivalent to the em algorithm. J R Stat Soc Ser B-Stat Methodol. 1995;57(2):425-437. doi: · Zbl 0813.62021
[17] Akaike, H. Prediction and entropy. In: A Celebration of Statistics: The ISI Centenary Volume. Springer-Verlag; 1985.
[18] Burnham, KP, Anderson, DR.Model selection and multimodel inference: a practical information-theoretic approach. New York: Springer-Verlag; 2002. · Zbl 1005.62007
[19] Murtaugh, PA.In defense of p-values. Ecology. 2014;95(3):611-617. doi:
[20] Kaiser, RF.Composition and origin of glacial till, Mexico and Kasoag quadrangles, New York. J Sediment Res. 1962;32(3):502-513. doi:
[21] Barzilai, J, Borwein, JM.Two-point step size gradient methods. IMA J Numer Anal. 1988;8(1):141-148. doi: · Zbl 0638.65055
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.