×

On approximating weighted sums with exponentially many terms. (English) Zbl 1074.68050

Summary: Multiplicative weight-update algorithms such as Winnow and Weighted Majority have been studied extensively due to their on-line mistake bounds’ logarithmic dependence on \(N\), the total number of inputs, which allows them to be applied to problems where \(N\) is exponential. However, a large \(N\) requires techniques to efficiently compute the weighted sums of inputs to these algorithms. In special cases, the weighted sum can be exactly computed efficiently, but for numerous problems such an approach seems infeasible. Thus we explore applications of Markov chain Monte Carlo (MCMC) methods to estimate the total weight. Our methods are very general and applicable to any representation of a learning problem for which the inputs to a linear learning algorithm can be represented as states in a completely connected, untruncated Markov chain. We give theoretical worst-case guarantees on our technique and then apply it to two problems: learning DNF formulas using Winnow, and pruning classifier ensembles using Weighted Majority. We then present empirical results on simulated data indicating that in practice, the time complexity is much better than what is implied by our worst-case theoretical analysis.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68Q32 Computational learning theory
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)

Software:

C4.5; AdaBoost.MH; UCI-ml

References:

[1] C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases, http://www.ics.uci.edu/ mlearn/MLRepository.html; C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases, http://www.ics.uci.edu/ mlearn/MLRepository.html
[2] A. Blum, P. Chalasani, J. Jackson, On learning embedded symmetric concepts, in: Proceedings of the Sixth Annual Workshop on Computational Learning Theory, ACM Press, New York, NY, 1993, pp. 337-346.; A. Blum, P. Chalasani, J. Jackson, On learning embedded symmetric concepts, in: Proceedings of the Sixth Annual Workshop on Computational Learning Theory, ACM Press, New York, NY, 1993, pp. 337-346.
[3] A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, S. Rudich, Weakly learning DNF and characterizing statistical query learning using Fourier analysis, in: Proceedings of 26th ACM Symposium on Theory of Computing, 1994, pp. 253-262.; A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, S. Rudich, Weakly learning DNF and characterizing statistical query learning using Fourier analysis, in: Proceedings of 26th ACM Symposium on Theory of Computing, 1994, pp. 253-262. · Zbl 1345.68186
[4] Bshouty, N. H., Simple learning algorithms using divide and conquer, Comput. Complexity, 6, 2, 174-194 (1997) · Zbl 0868.68094
[5] N. H. Bshouty, J. Jackson, C. Tamon, More efficient PAC-learning of DNF with membership queries under the uniform distribution, J. Comput. System Sci., to appear (early version in COLT 99).; N. H. Bshouty, J. Jackson, C. Tamon, More efficient PAC-learning of DNF with membership queries under the uniform distribution, J. Comput. System Sci., to appear (early version in COLT 99). · Zbl 1072.68087
[6] Cesa-Bianchi, N.; Freund, Y.; Helmbold, D.; Haussler, D.; Schapire, R.; Warmuth, M., How to use expert advice, J. ACM, 44, 3, 427-485 (1997) · Zbl 0890.68066
[7] D. Chawla, L. Li, S.D. Scott, Efficiently approximating weighted sums with exponentially many terms, in: Proceedings of the 14th Annual Conference on Computational Learning Theory, 2001, pp. 82-98.; D. Chawla, L. Li, S.D. Scott, Efficiently approximating weighted sums with exponentially many terms, in: Proceedings of the 14th Annual Conference on Computational Learning Theory, 2001, pp. 82-98. · Zbl 0992.68092
[8] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (2000), Cambridge, MA: Cambridge, MA Cambridge University Press
[9] Dietterich, T. G.; Lathrop, R. H.; Lozano-Perez, T., Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence, 89, 1-2, 31-71 (1997) · Zbl 1042.68650
[10] Dyer, M.; Frieze, A.; Kannan, R.; Kapoor, A.; Vazirani, U., A mildly exponential time algorithm for approximating the number of solutions to a multidimensional knapsack problem, Combin. Probab. Comput., 2, 271-284 (1993) · Zbl 0819.90094
[11] Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. System Sci., 55, 1, 119-139 (1997) · Zbl 0880.68103
[12] C. Gentile, M. K. Warmuth, Linear hinge loss and average margin, in: M.S. Kearns, S.A. Solla, D.A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11, MIT Press, Cambridge, MA, 1998, pp. 225-231.; C. Gentile, M. K. Warmuth, Linear hinge loss and average margin, in: M.S. Kearns, S.A. Solla, D.A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11, MIT Press, Cambridge, MA, 1998, pp. 225-231.
[13] Goldman, S. A.; Kwek, S. K.; Scott, S. D., Agnostic learning of geometric patterns, J. Comput. System Sci., 6, 1, 123-151 (2001) · Zbl 0992.68175
[14] Goldman, S. A.; Scott, S. D., Multiple-instance learning of real-valued geometric patterns, Ann. Math. Artificial Intelligence, 39, 3, 259-290 (2003) · Zbl 1038.68055
[15] Helmbold, D. P.; Schapire, R. E., Predicting nearly as well as the best pruning of a decision tree, Mach. Learning, 27, 1, 51-68 (1997)
[16] M. Jerrum, A. Sinclair, The Markov chain Monte Carlo method: an approach to approximate counting and integration, in: D. Hochbaum (Ed.), Approximation Algorithms for NP-Hard Problems, Boston, MA, PWS Pub., 1996, pp. 482-520 (Chapter 12).; M. Jerrum, A. Sinclair, The Markov chain Monte Carlo method: an approach to approximate counting and integration, in: D. Hochbaum (Ed.), Approximation Algorithms for NP-Hard Problems, Boston, MA, PWS Pub., 1996, pp. 482-520 (Chapter 12).
[17] Jerrum, M. R.; Valiant, L. G.; Vazirani, V. V., Random generation of combinatorial structures from a uniform distribution, Theoret. Comput. Sci., 43, 169-188 (1986) · Zbl 0597.68056
[18] R. Khardon, D. Roth, R. Servedio, Efficiency versus convergence of Boolean kernels for online learning algorithms, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, Vol. 14, 2001, MIT Press, Cambridge, MA, pp. 423-430.; R. Khardon, D. Roth, R. Servedio, Efficiency versus convergence of Boolean kernels for online learning algorithms, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, Vol. 14, 2001, MIT Press, Cambridge, MA, pp. 423-430.
[19] Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P., Optimization by simulated annealing, Science, 220, 671-680 (1983) · Zbl 1225.90162
[20] Kivinen, J.; Warmuth, M. K.; Auer, P., The perceptron algorithm vs. Winnowlinear vs. logarithmic mistake bounds when few input variables are relevant, Artificial Intelligence, 97, 1-2, 325-343 (1997) · Zbl 0904.68112
[21] Littlestone, N., Learning quickly when irrelevant attributes abounda new linear-threshold algorithm, Mach. Learning, 2, 285-318 (1988)
[22] N. Littlestone, From on-line to batch learning, in: Proceedings of the Second Annual Workshop on Computational Learning Theory, Morgan Kaufmann, Los Altos, CA, 1989, pp. 269-284.; N. Littlestone, From on-line to batch learning, in: Proceedings of the Second Annual Workshop on Computational Learning Theory, Morgan Kaufmann, Los Altos, CA, 1989, pp. 269-284. · Zbl 0752.68066
[23] N. Littlestone, Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow, in: Proceedings of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 1991, pp. 147-156.; N. Littlestone, Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow, in: Proceedings of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann, San Mateo, CA, 1991, pp. 147-156.
[24] Littlestone, N.; Warmuth, M. K., The weighted majority algorithm, Inform. and Comput., 108, 2, 212-261 (1994) · Zbl 0804.68121
[25] Maass, W.; Warmuth, M. K., Efficient learning with virtual threshold gates, Inform. Comput., 141, 1, 66-83 (1998) · Zbl 0893.68127
[26] D.D. Margineantu, T.G. Dietterich, Pruning adaptive boosting, in: Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1997, pp. 211-218.; D.D. Margineantu, T.G. Dietterich, Pruning adaptive boosting, in: Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1997, pp. 211-218.
[27] Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E., Equation of state calculation by fast computing machines, J. Chem. Phys., 21, 1087-1092 (1953) · Zbl 1431.65006
[28] B. Morris, A. Sinclair, Random walks on truncated cubes and sampling 0-1 knapsack solutions, SIAM J. Comput., to appear (early version in FOCS 99).; B. Morris, A. Sinclair, Random walks on truncated cubes and sampling 0-1 knapsack solutions, SIAM J. Comput., to appear (early version in FOCS 99). · Zbl 1101.68044
[29] Muller, K.-R; Mika, S.; Ratsch, G.; Tsuda, K.; Schölkopf, B., An introduction to kernel-based learning methods, IEEE Trans. Neural Networks, 12, 2, 181-201 (2001)
[30] Pereira, F.; Singer, Y., An efficient extension to mixture techniques for prediction and decision trees, Mach. Learning, 36, 3, 183-199 (1999) · Zbl 0948.68095
[31] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, CA, 1993.; J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, CA, 1993.
[32] J.R. Quinlan, Bagging, boosting, and C4.5, in: Proceedings of the 13th National Conference on Aritificial Intelligence, 1996, pp. 725-730.; J.R. Quinlan, Bagging, boosting, and C4.5, in: Proceedings of the 13th National Conference on Aritificial Intelligence, 1996, pp. 725-730.
[33] Rosenblatt, F., The perceptrona probabilistic model for information storage and organization in the brain, Psych. Rev., 65, 386-407 (1958), (Reprinted in Neurocomputing, MIT Press, Cambridge, MA, 1988)
[34] Schapire, R. E.; Freund, Y.; Bartlett, P.; Lee, W. S., Boosting the margina new explanation for the effectiveness of voting methods, Ann. Statist., 26, 5, 1651-1686 (1998) · Zbl 0929.62069
[35] Schapire, R. E.; Singer, Y., Improved boosting algorithms using confidence-rated predictions, Mach. Learning, 37, 3, 297-336 (1999) · Zbl 0945.68194
[36] Schölkopf, B.; Smola, A. J., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (2002), MIT Press: MIT Press Cambridge, MA
[37] S.D. Scott, J. Zhang, J. Brown, On generalized multiple-instance learning, Technical Report UNL-CSE-2003-5, Dept. of Computer Science, University of Nebraska, 2003.; S.D. Scott, J. Zhang, J. Brown, On generalized multiple-instance learning, Technical Report UNL-CSE-2003-5, Dept. of Computer Science, University of Nebraska, 2003.
[38] Sinclair, A., Improved bounds for mixing rates of Markov chains and multicommodity flow, Combin. Probab. Comput., 1, 351-370 (1992) · Zbl 0801.90039
[39] Takimoto, E.; Warmuth, M., Predicting nearly as well as the best pruning of a planar decision graph, Theoret. Comput. Sci., 288, 2, 217-235 (2002) · Zbl 1061.68131
[40] C. Tamon, J. Xiang, On the boosting pruning problem, in: Proceedings of the 11th European Conference on Machine Learning, Springer, Berlin, 2000, pp. 404-412.; C. Tamon, J. Xiang, On the boosting pruning problem, in: Proceedings of the 11th European Conference on Machine Learning, Springer, Berlin, 2000, pp. 404-412.
[41] Q. Tao, S. Scott, An analysis of MCMC sampling methods for estimating weighted sums in Winnow, in: C.H. Dagli (Ed.), Artificial Neural Networks in Engineering, ASME Press, Fairfield, NJ, 2003, pp. 15-20.; Q. Tao, S. Scott, An analysis of MCMC sampling methods for estimating weighted sums in Winnow, in: C.H. Dagli (Ed.), Artificial Neural Networks in Engineering, ASME Press, Fairfield, NJ, 2003, pp. 15-20.
[42] Q. Tao, S.D. Scott, A faster algorithm for generalized multiple-instance learning, in: Proceedings of the Seventeenth Annual FLAIRS Conference, AAAI Press, Miami Beach, FL, 2004, to appear.; Q. Tao, S.D. Scott, A faster algorithm for generalized multiple-instance learning, in: Proceedings of the Seventeenth Annual FLAIRS Conference, AAAI Press, Miami Beach, FL, 2004, to appear.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.