
Hierarchical clustering with prototypes via minimax linkage. (English) Zbl 1229.62083

Summary: Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a data set. The nature of the clustering depends on the choice of linkage, that is, on how one measures the distance between clusters. We investigate minimax linkage, a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original data set with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and are robust against certain perturbations of a data set. We provide an efficient implementation and illustrate minimax linkage’s strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a data set of images of human faces.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T10 Pattern recognition, speech recognition
65C60 Computational problems in statistics (MSC2010)