skip to main content
Article

Predictive discrete latent factor models for large scale dyadic data

Published: 12 August 2007 Publication History

Abstract

We propose a novel statistical method to predict large scale dyadic response variables in the presence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model. The discovered latent factors provide a redictive model that is both accurate and interpretable. We illustrate our method by working in a framework of generalized linear models, which include commonly used regression techniques like linear regression, logistic regression and Poisson regression as special cases. We also provide scalable generalized EM-based algorithms for model fitting using both "hard" and "soft" cluster assignments. We demonstrate the generality and efficacy of our approach through large scale simulation studies and analysis of datasets obtained from certain real-world movie recommendation and internet advertising applications.

Supplementary Material

JPG File (p26-agarwal-200.jpg)
JPG File (p26-agarwal-768.jpg)
Low Resolution (p26-agarwal-200.mov)
High Resolution (p26-agarwal-768.mov)

References

[1]
M. Aitkin. A general maximum likelihood analysis of overdispersion in generalized linear models. Journal of Statistics and Computing, 6(3):1573--1375, September 1996.
[2]
A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. JMLR, 2007. to appear.
[3]
A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergences. JMLR, 6:1705--1749, 2005.
[4]
D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, 2004.
[5]
D. Chickering, D. Heckerman, C. Meek, J. C. Platt, and B. Thiesson. Targeted internet advertising using predictive clustering and linear programming. http://research.microsoft.com/meek/papers/goal-oriented.ps.
[6]
I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003.
[7]
C. Fernandez and P. J. Green. Modelling spatially correlated data via mixtures: a Bayesian approach. Journal of Royal Statistics Society Series B, (4):805--826, 2002.
[8]
G. Golub and C. Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD., 1989.
[9]
Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz.
[10]
A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. JMLR, 6:2049--2073, 2005.
[11]
P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.
[12]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, pages 50--57, Berkeley, California, August 1999.
[13]
D. L. Lee and S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2001.
[14]
B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006.
[15]
S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Trans. Computational Biology and Bioinformatics, 1(1):24--45, 2004.
[16]
P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall/CRC, 1989.
[17]
S. Merugu. Distributed Learning using Generative Models. PhD thesis, Dept. of ECE, Univ. of Texas at Austin, 2006.
[18]
T. M. Mitchell. Machine Learning. McGraw-Hill Intl, 1997.
[19]
R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, pages 355--368. MIT Press, 1998.
[20]
K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077--1087, 2001.
[21]
M. Pazzani. A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, (5-6):393--408, 1999.
[22]
J. Rasbash and H. Goldstein. Efficient analysis of mixed hierarchical and cross-classified random structures using a multilevel model. Journal of Educational Statistics, (4):337--350, 1994.
[23]
P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of the ACM Conference on CSCW, pages 175--186, 1994.

Cited By

View all
  • (2023)Latent Block Regression ModelClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_9(73-81)Online publication date: 8-Dec-2023
  • (2022)Bilateral discriminative autoencoder model orienting co-representation learningKnowledge-Based Systems10.1016/j.knosys.2022.108653245(108653)Online publication date: Jun-2022
  • (2018)MFCC: An Efficient and Effective Matrix Factorization Model Based on Co-clusteringInternet Multimedia Computing and Service10.1007/978-981-10-8530-7_35(360-370)Online publication date: 1-Mar-2018
  • Show More Cited By

Index Terms

  1. Predictive discrete latent factor models for large scale dyadic data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2007
      1080 pages
      ISBN:9781595936097
      DOI:10.1145/1281192
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. co-clustering
      2. dyadic data
      3. generalized linear regression
      4. latent factor modeling

      Qualifiers

      • Article

      Conference

      KDD07

      Acceptance Rates

      KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 06 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Latent Block Regression ModelClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_9(73-81)Online publication date: 8-Dec-2023
      • (2022)Bilateral discriminative autoencoder model orienting co-representation learningKnowledge-Based Systems10.1016/j.knosys.2022.108653245(108653)Online publication date: Jun-2022
      • (2018)MFCC: An Efficient and Effective Matrix Factorization Model Based on Co-clusteringInternet Multimedia Computing and Service10.1007/978-981-10-8530-7_35(360-370)Online publication date: 1-Mar-2018
      • (2017)Active Learning with Multiple Localized Regression ModelsINFORMS Journal on Computing10.1287/ijoc.2016.073229:3(503-522)Online publication date: Aug-2017
      • (2017)Simultaneous Co-Clustering and Classification in Customers InsightJournal of Physics: Conference Series10.1088/1742-6596/824/1/012033824(012033)Online publication date: 18-Apr-2017
      • (2017)Defending shilling attacks in recommender systems using soft co-clusteringIET Information Security10.1049/iet-ifs.2016.034511:6(319-325)Online publication date: 1-Nov-2017
      • (2017)A group interest-based collaborative filtering algorithm for multimedia informationMultimedia Tools and Applications10.1007/s11042-017-5516-x77:4(4401-4415)Online publication date: 24-Dec-2017
      • (2017)Exploiting interactions of review text, hidden user communities and item groups, and time for collaborative filteringKnowledge and Information Systems10.1007/s10115-016-1005-152:1(221-254)Online publication date: 1-Jul-2017
      • (2016)Constraint Co-Projections for Semi-Supervised Co-ClusteringIEEE Transactions on Cybernetics10.1109/TCYB.2015.249617446:12(3047-3058)Online publication date: Dec-2016
      • (2015)Concurrent goal-oriented co-clustering generation in social networksProceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)10.1109/ICOSC.2015.7050833(350-357)Online publication date: Feb-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media