×

Goodness of fit for log-linear network models: dynamic Markov bases using hypergraphs. (English) Zbl 1400.62124

Summary: Social networks and other sparse data sets pose significant challenges for statistical inference, since many standard statistical methods for testing model/data fit are not applicable in such settings. Algebraic statistics offers a theoretically justified approach to goodness-of-fit testing that relies on the theory of Markov bases. Most current practices require the computation of the entire basis, which is infeasible in many practical settings. We present a dynamic approach to explore the fiber of a model, which bypasses this issue, and is based on the combinatorics of hypergraphs arising from the toric algebra structure of log-linear models. We demonstrate the approach on the Holland-Leinhardt \(p_1\) model for random directed graphs that allows for reciprocation effects.

MSC:

62H17 Contingency tables
13P25 Applications of commutative algebra (e.g., to statistics, control theory, optimization, etc.)
05C80 Random graphs (graph-theoretic aspects)
91D30 Social networks; opinion dynamics

References:

[1] Aoki, S., Takemura, A. (2003). Minimal basis for a connected Markov chain over \[3\times 3\times k3\]×3×k contingency tables with fixed two-dimensional marginals. Australian & New Zealand Journal of Statistics, 45(2), 229-249. · Zbl 1064.62068
[2] Aoki, S., Takemura, A. (2005). Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. Journal of Statistical Computation and Simulation, 75(10), 787-812. · Zbl 1076.62060
[3] Aoki, S., Hara, H., Takemura, A. (2012). Markov bases in algebraic statistics. Springer Series in Statistics. New York: Springer. · Zbl 1304.62015
[4] Baird, D., Ulanowicz, R. (1989). The seasonal dynamics of the Chesapeake Bay ecosystem. Ecological Monographs, 59, 329-364.
[5] Bishop, Y. M., Fienberg, S. E., Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. New York: Springer. · Zbl 0332.62039
[6] Chatterjee, S., Diaconis, P., Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability, 21(4), 1400-1435. · Zbl 1234.05206
[7] Chen, Y., Dinwoodie, I. H., Sullivant, S. (2005). Sequential importance sampling for multiway tables. Annals of Statistics, 34, 523-545. · Zbl 1091.62051
[8] Csardi, G., Nepusz, T. (2006). The igraph software package for complex network research. International Journal of Complex Systems, 1695.
[9] Develin, M., Sullivant, S. (2003). Markov bases of binary graph models. Annals of Combinatorics, 7(4), 441-466. · Zbl 1037.05045
[10] Diaconis, P., Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distribution. Annals of Statistics, 26(1), 363-397. · Zbl 0952.62088
[11] Dinwoodie, I. H., Chen, Y. (2011). Sampling large tables with constraints. Statistica Sinica, 21, 1591-1609. · Zbl 1225.62012
[12] Dobra, A. (2003). Markov bases for decomposable graphical models. Bernoulli, 9(6), 1093-1108. · Zbl 1053.62072 · doi:10.3150/bj/1072215202
[13] Dobra, A. (2012). Dynamic Markov bases. Journal of Computational and Graphical Statistics, 21(12), 496-517.
[14] Dobra, A., Sullivant, S. (2004). A divide-and-conquer algorithm for generating Markov bases of multi-way tables. Computational Statistics, 19, 347-366. · Zbl 1063.62085
[15] Dobra, A., Fienberg, S. E., Rinaldo, A., Slavković, A., Zhou, Y. (2008). Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation and disclosure limitation. Emerging applications of algebraic geometry (pp. 63-88). IMA. Volumes in Mathematics and its Applications, vol. 149, New York: Springer Verlag. · Zbl 1166.13033
[16] Drton, M., Sturmfels, B., Sullivant, S. (2009). Lectures on algebraic statistics, Oberwolfach Seminars, vol 39. Springer, Basel. doi:10.1007/978-3-7643-8905-5. · Zbl 1166.13001
[17] Fienberg, S. E., Wasserman, S. S. (1981). Discussion of Holland, P. W. and Leinhardt, S. An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 54-57 (1981).
[18] Fienberg, S.E., Petrović, S., Rinaldo, A. (2010). Algebraic statistics for \[p_1\] p1 random graph models: Markov bases and their uses. Looking Back. Proceedings of a Conference in Honor of Paul W. Holland, chapter 1, Lecture Notes in Statistics—Proceedings, vol.202, New York: Springer.
[19] Goldenberg, A., Zheng, A. X., Fienberg, S. E., Airoldi, E. M. (2009). A survey of statistical network models. Foundations and Trends in Machine Learning, 2(2), 129-233. · Zbl 1184.68030
[20] Gross, E., Petrović, S. (2013). Combinatorial degree bound for toric ideals of hypergraphs. International Journal of Algebra and Computation, 23(6), 1503-1520. · Zbl 1273.05237
[21] Gross, E., Petrović, S., Stasi, D. (2014). Goodness of fit for log-linear network models: supplementary material. http://math.iit.edu/ spetrov1/DynamicP1supplement/. Accessed 18 Mar 2016. · Zbl 1400.62124
[22] Haberman, S. J. (1981). An exponential family of probabilty distributions for directed graphs: Comment. Journal of the American Statistical Association, 76(373), 60-61.
[23] Hara, H., Takemura, A. (2010). Connecting tables with zero-one entries by a subset of a Markov basis. In M. Viana, H. Wynn (Eds.), Algebraic methods in statistics and probability II, contemporarymathematics (Vol. 516, pp. 199-213)., American Mathematical Society: Providence. · Zbl 1196.62073
[24] Hara, H., Takemura, A., Yoshida, R. (2009a). Markov bases for two-way subtable sum problems. Journal of Pure and Applied Algebra, 213(8), 1507-1521. · Zbl 1231.62104
[25] Hara, H., Takemura, A., Yoshida, R. (2009b). A Markov basis for conditional test of common diagonal effect in quasi-independence model for square contingency tables. Computational Statistics & Data Analysis, 53(4), 1006-1014. · Zbl 1452.62394
[26] Hara, H., Aoki, S., Takemura, A. (2010). Minimal and minimal invariant Markov bases of decomposable models for contingency tables. Bernoulli, 16(1), 208-233. · Zbl 1203.62110
[27] Hara, H., Aoki, S., Takemura, A. (2012). Running Markov chain without Markov basis. In T. Hibi (Ed.), Harmony of Gröbner bases and the modern industrial society. Singapore: World Scientific. · Zbl 1341.62150
[28] Haws, D., Martin del Campo, A., Takemura, A., Yoshida, R. (2014). Markov degree of the three-state toric homogeneous Markov chain model. Beiträge zur Algebra und Geometrie/Contributions to Algebra and Geometry, 55, 161-188. · Zbl 1415.60089
[29] Holland, P. W., Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76(373), 33-65. · Zbl 0457.62090
[30] Hunter, D. R., Goodreau, S. M., Handcock, M. S. (2008). Goodness of fit of social network models. Journal of the American Statistical Association, 103(481), 248-258. · Zbl 1471.62390
[31] Král, D., Norine, S., Pangrác, O. (2010). Markov bases of binary graph models of \[K_4\] K4-minor free graphs. Journal of Combinatorial Theory, Series A, 117(6), 759-765. · Zbl 1215.05173
[32] Kushimba, S., Chaggar, H., Gross, E., Kunyu, G. (2013). Social networks of mobey money in Kenya. In: Working Paper 2013-1, Institute for Money, Technology, and Financial Inclusion, Irvine.
[33] Norén, P. (2015). The three-state toric homogeneous Markov chain model has Markov degree two. Journal of Symbolic Computation, 68(2), 285-296. · Zbl 1315.13045 · doi:10.1016/j.jsc.2014.09.014
[34] Ogawa, M., Hara, H., Takemura, A. (2013). Graver basis for an undirected graph and its application to testing the beta model of random graphs. Annals of Institute of Statistical Mathematics, 65(1), 191-212. · Zbl 1440.13114
[35] Pajek (2004a). Food webs. http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm. Accessed 18 Mar 2016.
[36] Pajek (2004b). Sampson’s monastery dataset. http://vlado.fmf.uni-lj.si/pub/networks/data/esna/sampson.htm. Accessed 18 Mar 2016.
[37] Petrović, S., Stasi, D. (2014). Toric algebra of hypergraphs. Journal of Algebraic Combinatorics, 39(1), 187-208. · Zbl 1284.05130
[38] Petrović, S., Rinaldo, A., Fienberg, S.E. (2010). Algebraic statistics for a directed random graph model with reciprocation. In: M. A. G. Viana, H. Wynn (Eds.), Algebraic Methods in Statistics and Probability II, Contemporary Mathematics, vol. 516, American Mathematical Society. · Zbl 1197.13025
[39] R DCT (2005). R: a language and environment for statistical computing. http://www.R-project.org. Accessed 18 Mar 2016.
[40] Rapallo, F., Yoshida, R. (2010). Markov bases and subbases for bounded contingency tables. Annals of the Institute of Statistical Mathematics, 62(4), 785-805. · Zbl 1440.62220
[41] Robert, C., Casella, G. (1999). Monte Carlo statistical methods. In: Springer Texts in Statistics. New York: Springer. · Zbl 0935.62005
[42] Sampson, S.F. (1968). A novitiate in a period of change: an experimental and case study of relationships. PhD thesis, Department of Sociology, Cornell: Cornell University.
[43] Slavković, A. B. (2010). Partial information releases for confidential contingency table entries: Present and future research efforts. Journal of Privacy and Confidentiality, 1(2).
[44] Slavković, A. B., Zhu, X., Petrović, S. (2015). Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and markov bases. Annals of the Institute of Statistical Mathematics, 67(4), 621-648. · Zbl 1440.62221
[45] Sturmfels, B. (1996). Gröbner bases and convex polytopes., University Lecture Series. Providence: American Mathematical Society. · Zbl 0856.13020
[46] Sturmfels, B., Welker, V. (2012). Commutative algebra of statistical ranking. Journal of Algebra, 361, 264-286. · Zbl 1316.13042
[47] Villarreal, R. H. (2000). Monomial algebras., Monographs and Research Notes in Mathematics. Boca Raton: Chapman and Hall/CRC. · Zbl 1325.13004
[48] Yamaguchi, T., Ogawa, M., Takemura, A. (2013). Markov degree of the Birkhoff model. Journal of Algebraic Combinatorics, 38(4), 1-19. · Zbl 1302.62052
[49] 4ti2 T (2008) 4ti2: a software package for algebraic, geometric and combinatorial problems on linear spaces combinatorial problems on linear spaces. http://www.4ti2.de. Accessed 18 Mar 2016. · Zbl 1315.13045
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.