×

A transformation-free linear regression for compositional outcomes and predictors. (English) Zbl 1520.62197

Summary: Compositional data are common in many fields, both as outcomes and predictor variables. The inventory of models for the case when both the outcome and predictor variables are compositional is limited, and the existing models are often difficult to interpret in the compositional space, due to their use of complex log-ratio transformations. We develop a transformation-free linear regression model where the expected value of the compositional outcome is expressed as a single Markov transition from the compositional predictor. Our approach is based on estimating equations thereby not requiring complete specification of data likelihood and is robust to different data-generating mechanisms. Our model is simple to interpret, allows for 0s and 1s in both the compositional outcome and covariates, and subsumes several interesting subcases of interest. We also develop permutation tests for linear independence and equality of effect sizes of two components of the predictor. Finally, we show that despite its simplicity, our model accurately captures the relationship between compositional data using two datasets from education and medical research.
{© 2021 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.}

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Aitchison, J. (1982) The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160. · Zbl 0491.62017
[2] Aitchison, J. (1986) The Statistical Analysis of Compositional Data. London: Chapman & Hall. · Zbl 0688.62004
[3] Aitchison, J. (1992) On criteria for measures of compositional difference. Mathematical Geology, 24, 365-379. · Zbl 0970.86531
[4] Aitchison, J. (2003) The Statistical Analysis of Compositional Data. Caldwell, NJ: Blackburn Press. · Zbl 0688.62004
[5] Aitchison, J. and Bacon‐Shone, J. (1984) Log contrast models for experiments with mixtures. Biometrika, 71, 323-330.
[6] Aitchison, J. and Bacon‐Shone, J. (1999) Convex linear combinations of compositions. Biometrika, 86, 351-364. · Zbl 0931.62009
[7] Alenazi, A. (2019) Regression for compositional data with compositional data as predictor variables with or without zero values. Journal of Data Science, 17, 219-237.
[8] Billheimer, D., Guttorp, P. and Fagan, W.F. (2001) Statistical interpretation of species composition. Journal of the American Statistical Association, 96, 1205-1214. · Zbl 1073.62573
[9] Böhning, D. (1992) Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics, 44, 197-200. · Zbl 0763.62038
[10] Butler, A. and Glasbey, C. (2008) A latent Gaussian model for compositional data with zeros. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57, 505-520.
[11] Chen, J., Zhang, X. and Li, S. (2017) Multiple linear regression with compositional response and covariates. Journal of Applied Statistics, 44, 2270-2285. · Zbl 1516.62201
[12] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1-22. · Zbl 0364.62022
[13] Du, Y. and Varadhan, R. (2020) SQUAREM: An R package for off‐the‐shelf acceleration of EM, MM and other EM‐like monotone algorithms. Journal of Statistical Software, Articles, 92, 1-41.
[14] Dubow, E.F., Boxer, P. and Huesmann, L.R. (2009) Long‐term effects of parents’ education on children’s educational and occupational success: Mediation by family interactions, child aggression, and teenage aspirations. Merrill‐Palmer Quarterly (Wayne State University. Press), 55, 224.
[15] Dumuid, D., Stanford, T.E., Martin‐Fernández, J.‐A., Pedišić, Ž., Maher, C.A., Lewis, L.K., et al. (2018) Compositional data analysis for physical activity, sedentary time and sleep research. Statistical Methods in Medical Research, 27, 3726-3738.
[16] Egozcue, J.J., Pawlowsky‐Glahn, V., Mateu‐Figueras, G. and Barcelo‐Vidal, C. (2003) Isometric log ratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300. · Zbl 1302.86024
[17] Eurostat (2015) Archive: Living condition statistics—family situation of today’s adults as children. https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Living_condition_statistics_-_family_situation_of_today
[18] Fiksel, J. and Datta, A. (2020) codalm: Transformation‐free linear regression for compositional outcomes and predictors. https://cran.r-project.org/package=codalm. Accessed 03/01/2020.
[19] Fiksel, J., Datta, A., Amouzou, A. and Zeger, S. (2021) Generalized bayes quantification learning under dataset shift. Journal of the American Statistical Association, https://doi.org/10.1080/01621459.2021.1909599 · Zbl 1514.68247 · doi:10.1080/01621459.2021.1909599
[20] Filzmoser, P., Hron, K. and Templ, M. (2018) Applied Compositional Data Analysis With Worked Examples in R. Cham, Switzerland: Springer.
[21] Friedman, J., Hastie, T. and Tibshirani, R. (2001) The Elements of Statistical Learning, Vol. 1. Springer Series in Statistics. New York: Springer. · Zbl 0973.62007
[22] Gourieroux, C., Monfort, A. and Trognon, A. (1984) Pseudo maximum likelihood methods: theory. Econometrica, 52, 681-700. · Zbl 0575.62031
[23] Greenwell, B.M. (2017) pdp: an R package for constructing partial dependence plots. The R Journal, 9, 421-436.
[24] Hamilton, N.E. and Ferry, M. (2018) ggtern: Ternary diagrams using ggplot2. Journal of Statistical Software, Code Snippets, 87, 1-17.
[25] Hron, K., Filzmoser, P. and Thompson, K. (2012) Linear regression with compositional explanatory variables. Journal of Applied Statistics, 39, 1115-1128. · Zbl 1514.62130
[26] Jones, M.M.T. (2005) Estimating Markov transition matrices using proportions data: an application to credit risk. Number 5‐219. Washington. DC: International Monetary Fund.
[27] Lee, T.‐C., Judge, G.G. and Zellner, A. (1970) Estimating the Parameters of the Markov Probability Model from Aggregate Time Series Data. Contributions to Economic Analysis, vol. 65. Amsterdam: North‐Holland. · Zbl 0199.53202
[28] Leite, M.L.C. (2016) Applying compositional data methodology to nutritional epidemiology. Statistical Methods in Medical Research, 25, 3057-3065.
[29] Lin, W., Shi, P., Feng, R. and Li, H. (2014) Variable selection in regression with compositional covariates. Biometrika, 101, 785-797. · Zbl 1306.62164
[30] MacRae, E.C. (1977) Estimation of time‐varying Markov processes with aggregate data. Econometrica, 45, 183-198. · Zbl 0364.62084
[31] Maier, M.J. (2014) Dirichletreg: Dirichlet regression for compositional data in R. Research Report Series, Report 125. Vienna, Australia: Institute for Statistics and Mathematics, WU Vienna University of Economics and Business.
[32] Morais, J., Thomas‐Agnan, C. and Simioni, M. (2018) Interpretation of explanatory variables impacts in compositional regression models. Austrian Journal of Statistics, 47, 1-25.
[33] Mullahy, J. (2015) Multivariate fractional regression estimation of econometric share models. Journal of Econometric Methods, 4, 71-100. · Zbl 1345.62096
[34] Murteira, J.M. and Ramalho, J.J. (2016) Regression analysis of multivariate fractional data. Econometric Reviews, 35, 515-552. · Zbl 1491.62248
[35] Nguyen, T.H.A., Laurent, T., Thomas‐Agnan, C. and Ruiz‐Gazen, A. (2018) Analyzing the impacts of socio‐economic factors on French departmental elections with coda methods. TSE Working Papers 18‐961. Toulouse, France: Toulouse School of Economics (TSE).
[36] Papke, L.E. and Wooldridge, J.M. (1996) Econometric methods for fractional response variables with an application to 401 (k) plan participation rates. Journal of Applied Econometrics, 11, 619-632.
[37] Templ, M., Filzmoser, P. and Reimann, C. (2008) Cluster analysis applied to regional geochemical data: problems and possibilities. Applied Geochemistry, 23, 2198-2213.
[38] Templ, M., Hron, K. and Filzmoser, P. (2011) robCompositions: an R‐package for robust statistical analysis of compositional data. In: Pawlowsky, V. (ed.) and Buccianti, G.A. (ed.) (Eds.) Compositional Data Analysis: Theory and Applications. New York: John Wiley and Sons, pp. 341-355.
[39] Tsagris, M. (2015) Regression analysis with compositional data containing zero values. arXiv preprint, arXiv:1508.01913. · Zbl 1449.62152
[40] Van den Boogaart, K.G. and Tolosana‐Delgado, R. (2013) Analyzing Compositional Data with R, Vol. 122. Berlin: Springer. · Zbl 1276.62011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.