×

Active learning with multiple localized regression models. (English) Zbl 1380.68332

Summary: Oftentimes businesses face the challenge of requiring costly information to improve the accuracy of prediction tasks. One notable example is obtaining informative customer feedback (e.g., customer-product ratings via costly incentives) to improve the effectiveness of recommender systems. In this paper, we develop a novel active learning approach, which aims to intelligently select informative training instances to be labeled so as to maximally improve the prediction accuracy of a real-valued prediction model. We focus on large, heterogeneous, and dyadic data, and on localized modeling techniques, which have been shown to model such data particularly well, as compared to a single, “global” model. Importantly, dyadic data with covariates is pervasive in contemporary big data applications such as large-scale recommender systems and search advertising. A key benefit from incorporating dyadic information is their simple, meaningful representation of heterogeneous data, in contrast to alternative local modeling techniques that typically produce complex and incomprehensible predictive patterns. We develop a computationally efficient active learning policy specifically tailored to exploit multiple local prediction models to identify informative acquisitions. Existing active learning policies are often computationally prohibitive for the setting we explore, and our policy makes the application of active learning computationally feasible for this setting. We present comprehensive empirical evaluations that demonstrate the benefits of our approach and explore its performance in real world, challenging domains.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62J02 General nonlinear regression
62P20 Applications of statistics to economics
Full Text: DOI

References:

[1] Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. Proc. ICML ’98 (Morgan Kaufmann Publishers, San Francisco), 1-9.Google Scholar
[2] Agarwal D, Merugu S (2007) Predictive discrete latent factor models for large scale dyadic data. Proc. 13th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 26-35.Crossref, Google Scholar · doi:10.1145/1281192.1281199
[3] Baumann T, Germond AJ (1993) Application of the Kohonen network to short-term load forecasting. Proc. ANNPS ’93 (IEEE Computer Society, Washington, DC), 407-412.Crossref, Google Scholar · doi:10.1109/ANN.1993.264313
[4] Bilgic M, Bennett PN (2012) Active query selection for learning rankers. Proc. 35th Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 1033-1034.Crossref, Google Scholar · doi:10.1145/2348283.2348455
[5] Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees (Wadsworth, Belmont, CA).Google Scholar · Zbl 0541.62042
[6] Burbidge R, Rowland JJ, King RD (2007) Active learning for regression based on query by committee. Proc. Intelligent Data Engrg. Automated Learn. (IDEAL) (Springer, New York), 209-218.Crossref, Google Scholar · doi:10.1007/978-3-540-77226-2_22
[7] Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Machine Learn. 15(2):201-221.Crossref, Google Scholar · doi:10.1007/BF00993277
[8] Cohn D, Ghahramani Z, Jordan M (1996) Active learning with statistical models. J. Artificial Intelligence Res. 4:129-145.Crossref, Google Scholar · Zbl 0900.68366 · doi:10.1613/jair.295
[9] Deodhar M, Ghosh J (2008) Simultaneous co-segmentation and predictive modeling for large, temporal marketing data. Proc. Data Mining Design Marketing, ICDM 2008 Workshop (IEEE Computer Society, Washington, DC), 806-815.Crossref, Google Scholar · doi:10.1109/ICDMW.2008.17
[10] Deodhar M, Ghosh J (2010) SCOAL: A framework for simultaneous co-clustering and learning from complex data. J. ACM Trans. Knowledge Discovery from Data 4(3):Article no. 10.Google Scholar
[11] Djukanovic M, Babic B, Sobajic D, Pao Y (1993) Unsupervised/supervised learning concept for 24-hour load forecasting. IEE Proc.-Generation, Transmission Distribution 140(4):311-318.Crossref, Google Scholar · doi:10.1049/ip-c.1993.0046
[12] Fedorov V (1972) Theory of Optimal Experiments (Academic Press, New York).Google Scholar
[13] Fukumizu K (2000) Statistical active learning in multilayer perceptrons. IEEE Trans. Neural Networks 11(1):17-26.Crossref, Google Scholar · doi:10.1109/72.822506
[14] Gill P, Murray W, Wright M (1981) Practical Optimization (Academic Press, London).Google Scholar · Zbl 0503.90062
[15] Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning (Springer, New York).Crossref, Google Scholar · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[16] Huang Z (2007) Selectively acquiring ratings for product recommendation. Proc. 9th Internat. Conf. Electronic Commerce (ICEC ’07) (ACM, New York), 379-388.Crossref, Google Scholar · doi:10.1145/1282100.1282171
[17] Kanamori T, Shimodaira H (2003) Active learning algorithm using the maximum weighted log-likelihood estimator. J. Statist. Planning Inference 116(1):149-162.Crossref, Google Scholar · Zbl 1020.62065 · doi:10.1016/S0378-3758(02)00234-3
[18] Kiefer J (1959) Optimum experimental designs. J. Roy. Statist. Soc. 21(2):272-304.Google Scholar · Zbl 0108.15303
[19] Kim B, Sullivan M (1998) The effect of parent brand experience on line extension trial and repeat purchase. Marketing Lett. 9(2):181-193.Crossref, Google Scholar · doi:10.1023/A:1007961016262
[20] Kohavi R, Longbotham R, Sommerfield D, Henne1 R (2009) Controlled experiments on the web: Survey and practical guide. Data Mining Knowledge Discovery 18(1):140-181.Crossref, Google Scholar · doi:10.1007/s10618-008-0114-1
[21] Koren Y (2008) Factorization meets the neighborhood: A multifaceted collaborative filtering model. Proc. 14th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 426-434.Crossref, Google Scholar · doi:10.1145/1401890.1401944
[22] Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. Croft BW, van Rijsbergen CJ, eds. Proc. SIGIR ’94 (Springer, London), 3-12.Crossref, Google Scholar · doi:10.1007/978-1-4471-2099-5_1
[23] Liu TY (2011) Learning to Rank for Information Retrieval (Springer, New York).Crossref, Google Scholar · Zbl 1227.68002 · doi:10.1007/978-3-642-14267-3
[24] Long B, Bian J, Chapelle O, Zhang Y, Inagaki Y, Chang Y (2015) Active learning for ranking through expected loss optimization. IEEE Trans. Knowledge Data Engrg. 27(5):1180-1191.Crossref, Google Scholar · doi:10.1109/TKDE.2014.2365785
[25] Melville P, Saar-Tsechansky M, Provost F, Mooney R (2005) An expected utility approach to active feature-value acquisition. Proc. ICDM ’05 (IEEE Computer Society, New York).Crossref, Google Scholar · doi:10.1109/ICDM.2005.23
[26] Menon AK, Jian X, Kim J, Vaidya J, Ohno-Machado L (2013) Detecting inappropriate access to electronic health records using collaborative filtering. Machine Learn. 95(1):87-101.Crossref, Google Scholar · doi:10.1007/s10994-013-5376-1
[27] Quinlan JR (1992) Learning with continuous classes. Proc. AI ’92 (World Scientific, Singapore), 343-348.Google Scholar
[28] RayChaudhuri T, Hamey LGC (1995) Minimisation of data collection by active learning. Proc. ICNN ’95 (IEEE, Piscataway, NJ),1338-1341.Google Scholar
[29] Roy N, McCallum AK (2001) Toward optimal active learning through sampling estimation of error reduction. Proc. ICML ’01 (Morgan Kaufmann Publishers, San Francisco), 441-448.Google Scholar
[30] Rubens N, Sugiyama M (2007) Influence-based collaborative active learning. Proc. RecSys ’07 (ACM, New York), 145-148.Crossref, Google Scholar · doi:10.1145/1297231.1297257
[31] Saar-Tsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Machine Learn. 54(2):153-178.Crossref, Google Scholar · Zbl 1057.68089 · doi:10.1023/B:MACH.0000011806.12374.c3
[32] Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Management Sci. 55(4):664-684.Link, Google Scholar
[33] Seetharaman PB, Ainslie A, Chintagunta PK (1999) Investigating household state dependence effects across categories. J. Marketing Res. 36(4):488-500.Crossref, Google Scholar · doi:10.2307/3152002
[34] Settles B (2012) Active Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning) (Morgan and Claypool Publishers, San Rafael, CA).Google Scholar · Zbl 1270.68006
[35] Seung HS, Opper M, Smopolinsky H (1992) Query by committee. Proc. COLT ’92 (ACM, New York), 287-294.Crossref, Google Scholar · doi:10.1145/130385.130417
[36] Sugiyama M (2006) Active learning in approximately linear regression based on conditional expectation of generalization error. J. Machine Learn. Res. 7:141-166.Google Scholar · Zbl 1222.68311
[37] Sugiyama M, Nakajima S (2009) Pool-based active learning in approximate linear regression. Machine Learn. 75(3):249-274.Crossref, Google Scholar · Zbl 1470.68181 · doi:10.1007/s10994-009-5100-3
[38] Sugiyama M, Rubens N (2008) Active learning with model selection in linear regression. Proc. SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 518-529.Crossref, Google Scholar · doi:10.1137/1.9781611972788.47
[39] Wang Y, Witten IH (1997) Inducing model trees for continuous classes. van Someren M, Widmer G, eds. Proc. ECML ’97, Prague.Google Scholar
[40] Wedel M, Steenkamp J (1991) A clusterwise regression method for simultaneous fuzzy market structuring and benefit segmentation. J. Marketing Res. 28(4):385-396.Crossref, Google Scholar · doi:10.2307/3172779
[41] Wiens D (2000) Robust weights and designs for biased regression models: Least squares and generalized m-estimation. J. Statist. Planning Inference 83(2):395-412.Crossref, Google Scholar · Zbl 0976.62075 · doi:10.1016/S0378-3758(99)00102-0
[42] Zhang C, Chen T (2002) An active learning framework for content-based information retrieval. IEEE Trans. Multimedia 4(2):260-268.Crossref, Google Scholar · doi:10.1109/TMM.2002.1017738
[43] Zheng Z,
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.