Document Zbl 1321.68244

Fang, Qiong; Ng, Wilfred; Feng, Jianlin; Li, Yuliang

Mining order-preserving submatrices from probabilistic matrices. (English) Zbl 1321.68244

ACM Trans. Database Syst. 39, No. 1, Article No. 6, 43 p. (2014).

MSC:

68P15	Database theory
15B51	Stochastic matrices
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

OPSM mining; order-preserving submatrices; probabilistic matrices; probabilistic support

Software:

PrefixSpan

Cite Review PDF

Full Text: DOI

References:

[1]	Charu C. Aggarwal, Yan Li, Jianyong Wang, and Jin Wang. 2009. Frequent pattern mining with uncertain data. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 29–37.
[2]	Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, and Jong Soo Park. 1999. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’99). ACM, New York, NY, 61–72.
[3]	Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’98). ACM, New York, NY, 94–105.
[4]	Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the IEEE 11th International Conference on Data Engineering (ICDE’95). IEEE Computer Society, Los Alamitos, CA, 3–14.
[5]	Mohammad Ahsanullah, Valery Nevzorov, and Mohammad Shakil. 2013. An Introduction to Order Statistics. Atlantis Studies in Probability and Statistics, Vol. 3, Atlantis Press. · Zbl 1276.62029 · doi:10.2991/978-94-91216-83-1
[6]	Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha. 2004. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04). ACM, New York, NY, 509–514. · Zbl 1222.68139
[7]	Amir Ben-Dor, Benny Chor, Richard Karp, and Zohar Yakhini. 2002. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the 6th Annual International Conference on Research in Computational Molecular Biology. 49–57.
[8]	Stanislav Busygin, Gerrit Jacobsen, Ewald Kramer, and Contentsoft Ag. 2002. Double conjugated clustering applied to leukemia microarray data. In Proceedings of the 2nd SIAM ICDM Workshop on Clustering High Dimensional Data. SIAM, Philadelphia, PA.
[9]	Yizong Cheng and George M. Church. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. 93–103.
[10]	Lin Cheung, Kevin Y. Yip, David W. Cheung, Ben Kao, and Michael K. Ng. 2007. On mining micro-array data by order-preserving submatrix. Int. J. Bioinform. Res. Appl. 3, 1 (2007), 42–64. · doi:10.1504/IJBRA.2007.011834
[11]	Burton Kuan Hui Chia and R. Krishna Murthy Karuturi. 2010. Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms. Algor. Molecular Bio. 5, 23 (2010).
[12]	Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra. 2004. Minimum sum-squared residue co-clustering of gene expression data. In Proceedings of the SIAM International Conference on Data Mining (SDM’04). SIAM, Philadelphia, PA, 114–125.
[13]	Chun Kit Chui, Ben Kao, Kevin Y. Yip, and Sau Dan Lee. 2008. Mining order-preserving submatrices from data with repeated measurements. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE Computer Society, Los Alamitos, CA, 133–142.
[14]	Srivatsava Daruru, Nena Marín, Matt Walker, and Joydeep Ghosh. 2009. Pervasive parallelism in data mining: Dataflow solution to co-clustering large and sparse netflix data. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1115–1123.
[15]	Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, New York, NY, 89–98. · Zbl 1102.68545
[16]	Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 126–135.
[17]	Qiong Fang, Wilfred Ng, and Jianlin Feng. 2010. Discovering significant relaxed order-preserving submatrices. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). ACM, New York, NY, 433–442.
[18]	Qiong Fang, Wilfred Ng, Jianlin Feng, and Yuliang Li. 2012. Mining bucket order-preserving submatrices in gene expression data. IEEE Trans. Knowl. Data Eng. 24, 12 (2012), 2218–2231. · doi:10.1109/TKDE.2011.180
[19]	Byron J. Gao, Obi L. Griffith, Martin Ester, and Steven J. M. Jones. 2006. Discovering significant OPSM subspace clusters in massive gene expression data. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, New York, NY, 922–928.
[20]	Byron J. Gao, Obi L. Griffith, Martin Ester, Hui Xiong, Qiang Zhao, and Steven J. M. Jones. 2012. On the deep order-preserving submatrix problem: A best effort approach. IEEE Trans. Knowl. Data Eng. 24, 2 (2012), 309–325. · doi:10.1109/TKDE.2010.244
[21]	Gad Getz, Erel Levine, and Eytan Domany. 2000. Coupled two-way clustering analysis of gene microarray data. Proc. Nat. Aca. Sci. 97, 22 (2000), 12079–12084. · doi:10.1073/pnas.210134797
[22]	Tao Gu, Liang Wang, Zhanqing Wu, Xianping Tao, and Jian Lu. 2011. A pattern mining approach to sensor-based human activity recognition. IEEE Trans. Knowl. Data Eng. 23, 9 (2011), 1359–1372. · doi:10.1109/TKDE.2010.184
[23]	Stephan Günnemann, Ines Färber, Kittipat Virochsiri, and Thomas Seidl. 2012. Subspace correlation clustering: Finding locally correlated dimensions in subspace projections of the data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, New York, NY, 352–360.
[24]	Neelima Gupta and Seema Aggarwal. 2010. MIB: Using mutual information for biclustering gene expression data. Pattern Recog. 43, 8 (2010), 2692–2697. · Zbl 1207.68281 · doi:10.1016/j.patcog.2010.03.002
[25]	Rohit Gupta, Navneet Rao, and Vipin Kumar. 2010. Discovery of error-tolerant biclusters from noisy gene expression data. In Proceedings of the 9th International Workshop on Data Mining in Bioinformatics (BIOKDD’10). ACM, New York, NY.
[26]	J. A. Hartigan. 1972. Direct clustering of a data matrix. J. Am. Stat. Asso. 67, 337 (1972). · doi:10.1080/01621459.1972.10481214
[27]	Michael T. Heath. 2002. Scientific Computing: An Introductory Survey. McGraw-Hill Higher Education. · Zbl 0903.68072
[28]	Timothy R. Hughes, Matthew J. Marton, Allan R. Jones, et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102, (2000), 1, 109–126.
[29]	Jens Humrich, Thomas Gartner, and Gemma C. Garriga. 2011. A fixed parameter tractable integer program for finding the maximum order preserving submatrix. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11). IEEE Computer Society, Los Alamitos, CA, 1098–1103.
[30]	Trey Ideker, Vesteinn Thorsson, Jeffrey A. Ranish, R. Christmas, J. Bunler, J. Eng, R. Bumgarner, D. Goodlett, R. Aebersold, and L. Hood. 2001. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 5518 (2001), 929–934. · doi:10.1126/science.292.5518.929
[31]	Jan Ihmels, Gilgi Friedlander, Sven Bergmann, Ofer Sarig, Yaniv Ziv, and Naama Barkai. 2002. Revealing modular organization in the yeast transcriptional network. Nature Genetics 31, 4 (2002), 370–377.
[32]	Shuiwang Ji, Wenlu Zhang, and Jun Liu. 2012. A sparsity-inducing formulation for evolutionary co-clustering. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). ACM, New York, NY, 334–342.
[33]	Karin Kailing, Hans-Peter Kriegel, and Peer Kröger. 2004. Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04). SIAM, Philadelphia, PA, 246–257.
[34]	Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3, 1 (2009), 1–58. · doi:10.1145/1497577.1497578
[35]	Hye-Chung Kum, Jian Pei, Wei Wang, and Dean Duncan. 2003. ApproxMAP: Approximate mining of consensus sequential patterns. In Proceedings of the 3rd SIAM International Conference on Data Mining (SDM’02). SIAM, Philadelphia, PA, 311–315.
[36]	Mei-Ling Ting Lee, Frank C. Kuo, G. A. Whitmorei, and Jeffrey Sklar. 2000. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. 97, 18 (2000), 9834–9839. · Zbl 0955.92016 · doi:10.1073/pnas.97.18.9834
[37]	Guojun Li, Qin Ma, Haibao Tang, Andrew Paterson, and Ying Xu. 2009. QUBIC: A qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res. 37, 15 (2009), e101.
[38]	Jian Li and Amol Deshpande. 2010. Ranking continuous probabilistic datasets. Proc. VLDB Endow. 3, 1 (2010), 638–649.
[39]	Jinze Liu and Wei Wang. 2003. OP-Cluster: Clustering by tendency in high dimensional space. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE Computer Society, Los Alamitos, CA, 187–194.
[40]	Bo Long, Zhongfei Zhang, and Philip S. Yu. 2005. Co-clustering by block value decomposition. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05). ACM, New York, NY, 635–640.
[41]	S. C. Madeira, M. C. Teixeira, I. Sa-Correia, and A. L. Oliveira. 2010. Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans. Comput. Bio. Bioinform. 7, 1 (2010), 153–165. · doi:10.1109/TCBB.2008.34
[42]	Sara C. Madeira and Arlindo L. Oliveira. 2004. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Computat. Biol. Bioinform. 1, 1 (2004), 24–45. · doi:10.1109/TCBB.2004.2
[43]	Gabriela Moise and Jörg Sander. 2008. Finding non-redundant, statistically significant regions in high dimensional data: A novel approach to projected and subspace clustering. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). ACM, New York, NY, 533–541.
[44]	T. M. Murali and S Kasif. 2003. Extracting conserved gene expression motifs from gene expression data. In Proceedings of the Pacific Symposium on Biocomputing. 77–88. · Zbl 1219.92024
[45]	Muhammad Muzammal and Rajeev Raman. 2011. Mining sequential patterns from probabilistic databases. In Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’11). Lecture Notes in Computer Science, vol. 6635. Springer-Verlag, Berlin, 210–221.
[46]	Tung T. Nguyen, Richard R. Almon, Debra C. DuBois, William J Jusko, and Ioannis P Androulakis. 2010. Importance of replication in analyzing time-series gene expression data: Corticosteroid dynamics and circadian patterns in rat liver. BMC Bioinform. 11, 279 (2010).
[47]	Feng Pan, Xiang Zhang, and Wei Wang. 2008. CRD: Fast co-clustering on large datasets utilizing sampling-based matrix decomposition. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 173–184.
[48]	Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, and Vipin Kumar. 2009. An association analysis approach to biclustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 677–686.
[49]	Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explore Newsl. 6, 1, 90–105. · doi:10.1145/1007730.1007731
[50]	Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. 2001. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the IEEE 17th International Conference on Data Engineering (ICDE’01). IEEE Computer Society, Los Alamitos, CA, 215–224.
[51]	Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. 2004. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans. Knowl. Data Eng. 16, 11, 1424–1440.
[52]	Beatriz Pontes, Federico Divina, Raúl Giráldez, and J. S. Aguilar-Ruiz. 2010. Improved biclustering on expression data through overlapping control. Int. J. Intell. Comput. Cybernet. 3 (2010), 293–309. · Zbl 1214.68326 · doi:10.1108/17563781011049214
[53]	Amela Prelić, Stefan Bleuler, Philip Zimmermann, Anja Wille, P. Bünlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler. 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 9 (2006), 1122–1129.
[54]	Parisa Rashidi, Diane J. Cook, Lawrence B. Holder, and Maureen Schmitter-Edgecombe. 2011. Discovering activities to recognize and track in a smart environment. IEEE Trans. Knowl. Data Eng. 23, 4 (2011), 527–539. · doi:10.1109/TKDE.2010.148
[55]	Christopher Ré, Julie Letchner, Magdalena Balazinksa, and Dan Suciu. 2008. Event queries on correlated probabilistic streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 715–728.
[56]	Chris Seidel. 2008. Introduction to DNA Microarrays. Wiley-VCH Verlag GmbH & Co. KGaA, 1–26.
[57]	Mohamed A. Soliman and Ihab F. Ilyas. 2009. Ranking with uncertain scores. In Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE’09). IEEE Computer Society, Los Alamitos, CA, 317–328.
[58]	Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the International Conference on Extending Database Technology (EDBT’96). ACM, New York, NY, 3–17.
[59]	Amos Tanay, Roded Sharan, and Ron Shamir. 2002. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 (2002), 136–144. · doi:10.1093/bioinformatics/18.suppl_1.S136
[60]	Andrew C. Trapp and Oleg A. Prokopyev. 2010. Solving the order-preserving submatrix problem via integer programming. INFORMS J. Comput. 22, 3 (July 2010), 387–400. · Zbl 1243.90151 · doi:10.1287/ijoc.1090.0358
[61]	Hua Wang, Feiping Nie, Heng Huang, and Chris Ding. 2011. Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In Proceedings of the 11th IEEE International Conference on Data Mining (ICDM’11). IEEE Computer Society, Los Alamitos, CA, 774–783.
[62]	Evan Welbourne, Nodira Khoussainova, Julie Letchner, Yang Li, Magdalena Balazinska, Gaetano Borriello, and Dan Suciu. 2008. Cascadia: A system for specifying, detecting, and managing RFID events. In Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services (MobiSys’08). ACM, New York, NY, 281–294.
[63]	Ka Yee Yeung, Mario Medvedovic, and Roger Bumgarner. 2003. Clustering gene-expression data with repeated measurements. Gen. Biol. 4, 5 (2003).
[64]	Kevin Y. Yip, Ben Kao, Xinjie Zhu, Chun Kit Chui, Sau Dan Lee, and David W. Cheung. 2013. Mining order-preserving submatrices from data with repeated measurements. IEEE Trans. Knowl. Data Eng. 25, 7 (2013), 1587–1600. · doi:10.1109/TKDE.2011.167
[65]	Mengsheng Zhang, Wei Wang, and Jinze Liu. 2008. Mining approximate order preserving clusters in the presence of noise. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Los Alamitos, CA, 160–168.
[66]	Zhou Zhao, Da Yan, and Wilfred Ng. 2012. Mining probabilistically frequent sequential patterns in uncertain databases. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT’12). ACM, New York, NY, 74–85.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.