×

Classification and regression trees. (English) Zbl 0541.62042

The Wadsworth Statistics/Probability Series. Belmont, California: Wadsworth International Group, a Division of Wadsworth, Inc. X, 358 p. $ 29.25; $ 18.95 (1984).
This book is about CART (Classification and Regression Trees), an extension of two earlier methodologies developed at the Institute for Social Research of the University of Michigan. During the early 1960’s AID (Automatic Interaction Detection) was first used (at least in part) to grow regression trees. THAID (a sister-methodology primarily used for developing classification rules) followed in the early 1970’s. These procedures are not without problems; most notably the lack of a satisfactory stopping rule. CART attempts to circumvent this difficulty by diminishing the importance of the forward stopping rule in a clever way. The authors propose growing ”large” trees and then pruning them back to a manageable size using a two-fold measure called ”cost-complexicity”.
This book consists of twelve chapters: 1. Background; 2. Introduction to tree classification; 3. Right sized trees and honest estimates; 4. Splitting rules; 5. Strengthening and interpreting; 6. Medical diagnosis and prognosis; 7. Mass spectra classification; 8. Regression trees; 9. Bayes rules and partitions; 10. Optimal pruning; 11. Construction of trees from a learning sample; and 12. Consistency. There is a good list of references at the end of the book, however there are no exercises. The first eight chapters give an expository discussion of the use of binary trees as a nonparametric classification procedure. Last four chapters provide the theoretical framework for the methodology.
The combination of algorithmic, consultative, and theoretical material as well as the generous helping of live and simulated data sets make this work unusually worthwhile.
Reviewer: A.K.Gupta

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-02 Research exposition (monographs, survey articles) pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P99 Applications of statistics