×

A macro-DAG structure based mixture model. (English) Zbl 1487.62068

Summary: In the context of unsupervised classification of multidimensional data, we revisit the classical mixture model in the case where the dependencies among the random variables are described by a DAG structure. This structure is considered at two levels, the original DAG and its macro-representation. This two-level representation is the main base of the proposed mixture model. To perform unsupervised classification, we propose a dedicated algorithm called EM-mDAG, which extends the classical EM algorithm. In the Gaussian case, we show that this algorithm can be efficiently implemented. This approach has two main advantages. It favors the selection of a small number of classes and it allows a semantic interpretation of the classes based on a clustering within the macro-variables.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
68T37 Reasoning under uncertainty in the context of artificial intelligence

References:

[1] Aghaeepour, Nima; Finak, Greg; Hoos, Holger; Mosmann, Tim R.; Brinkman, Ryan; Gottardo, Raphael; Scheuermann, Richard H., Critical assessment of automated flow cytometry data analysis techniques, Nat. Methods, 10, 3, 228-238 (2013), The FlowCAP Consortium, The DREAM Consortium
[2] Azencott, Robert; Chalmond, Bernard; Coldefy, François, Markov fusion of a pair of noisy images to detect intensity valleys, Int. J. Comput. Vis., 16, 135-145 (1995)
[3] Baum, Leonard E.; Petrie, Ted; Soules, George; Weiss, Norman, A maximization technique occurring in the statistical analysis of probabilistic functions of m arkov chains, Ann. Math. Stat., 41, 1, 164-171 (1970) · Zbl 0188.49603
[4] Boedigheimer, Michael J.; Ferbas, John, Mixture modeling approach to flow cytometry data, Cytometry A, 73A, 421-429 (2008)
[5] Chalmond, Bernard, An iterative g ibbsian technique for the reconstruction of m-ary images, Pattern Recognit., 22, 747-761 (1989)
[6] Chalmond, Bernard, (Modeling and Inverse Problems in Image Analysis. Modeling and Inverse Problems in Image Analysis, Applied Mathematical Sciences, vol. 155 (2003), Springer-Verlag) · Zbl 1005.62081
[7] Chalmond, Bernard; Graffigne, Christine; Prenat, Michel; Roux, Michel, Contextual performance prediction for low-level image analysis algorithms, IEEE Trans. Image Process., 10, 1039-1046 (2001) · Zbl 1061.68569
[8] Chan, Cliburn; Feng, Feng; Ottinger, Janet; Foster, David; West, Mike; Kepler, Thomas B., Statistical mixture modeling for cell subtype identification in flow cytometry, Cytometry A, 73A, 693-701 (2008)
[9] Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urruti, Alejandra; Beitz, Benoit; Rouilly, Vincent; Duffy, Darragh; Patin, Etienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L.; Schwikowski, Benno, Automated flow cytometric analysis across large numbers of samples and cell types, Clin. Immunol. (2015), in press
[10] Dean, Nema; Raftery, Adrian E., Latent class analysis: variable selection, Ann. Inst. Statist. Math., 62, 11-35 (2010) · Zbl 1422.62085
[11] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the em algorithm, J. R. Stat. Soc., 39, 1-38 (1977) · Zbl 0364.62022
[13] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome, The Elements of Statistical Learning (2009), Springer · Zbl 1273.62005
[14] Jonesa, Thouis R.; Carpenter, Anne E.; Lamprecht, Michael R.; Moffat, Jason; Silver, Serena J.; Greniera, Jennifer K.; Castoreno, Adam B.; Eggert, Ulrike S.; Root, David E.; Golland, Polina; Sabatini, David M., Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning, PNAS, 106, 6, 1826-1831 (2009)
[15] Kayabol, Koray; Zerubia, Josiane, Unsupervised amplitude and texture classification of sar images with multinomial latent model, IEEE Trans. Image Process., 22, 2, 561-572 (2013) · Zbl 1373.94201
[16] Koski, Timo; Noble, John, (Bayesian Networks: An Introduction. Bayesian Networks: An Introduction, Wiley Series in Probability and Statistics (2009)) · Zbl 1277.62022
[17] Ormerod, Michael, Flow Cytometry—A Basic Introduction (2014)
[18] Pynea, Saumyadipta; Hua, Xinli; Wang, Kui; Rossin, Elizabeth; Lin, Tsung-I.; Maier, Lisa M.; Baecher-Allan, Clare; McLachlan, Geoffrey J.; Tamayo, Pablo; Hafler, David A.; De Jager, Philip L.; Mesirova, Jill P., Automated high-dimensional flow cytometric data analysis, Proc. Natl. Acad. Sci., 106, 21, 8519-8524 (2009)
[19] Rao, C. Radhakrishna; Toutenburg, Helge, Linear Models: Least Squares and Alternatives (1995), Springer · Zbl 0846.62049
[20] Rencher, Alvin C., Methods of Multivariate Analysis (2002), John Wiley and Sons, Inc. · Zbl 0995.62056
[21] Su, Yu; Jurie, Frédéric, Improving image classification using semantic attributes, Int. J. Comput. Vis., 100, 59-77 (2012)
[22] Thiesson, Bo; Meek, Christopher; Chickering, David Maxwell; Heckerman, David, Learning mixtures of dag models. Technical Report (1997), Microsoft Research
[24] van der Linde, Angelika, A Bayesian view of model complexity, Stat. Neerl., 66, 3, 253-271 (2012)
[25] Wasserman, Larry, (All of Statistics: A Concise Course in Statistical Inference. All of Statistics: A Concise Course in Statistical Inference, Springer Texts in Statistics (2004)) · Zbl 1053.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.