
On the empirical estimation of integral probability metrics. (English) Zbl 1295.62035

Summary: Given two probability measures, \(\mathbb{P}\) and \(\mathbb{Q}\) defined on a measurable space, \(S\), the integral probability metric (IPM) is defined as \[ \gamma_{\mathcal F}(\mathbb{P},\mathbb{Q})=\sup\left\{\left| \int_{S}f d\mathbb{P}-\int_{S} f d\mathbb{Q}\right|\,:\,f\in\mathcal{F}\right\}, \] where \(\mathcal{F}\) is a class of real-valued bounded measurable functions on \(S\). By appropriately choosing \(\mathcal{F}\), various popular distances between \(\mathbb{P}\) and \(\mathbb{Q}\), including the Kantorovich metric, Fortet-Mourier metric, dual-bounded Lipschitz distance (also called the Dudley metric), total variation distance, and kernel distance, can be obtained.
In this paper, we consider the problem of estimating \(\gamma_{\mathcal{F}}\) from finite random samples drawn i.i.d. from \(\mathbb{P}\) and \(\mathbb{Q}\). Although the above mentioned distances cannot be computed in closed form for every \(\mathbb{P}\) and \(\mathbb{Q}\), we show their empirical estimators to be easily computable, and strongly consistent (except for the total-variation distance). We further analyze their rates of convergence. Based on these results, we discuss the advantages of certain choices of \(\mathcal{F}\) (and therefore the corresponding IPMs) over others – in particular, the kernel distance is shown to have three favorable properties compared with the other mentioned distances: it is computationally cheaper, the empirical estimate converges at a faster rate to the population value, and the rate of convergence is independent of the dimension \(d\) of the space (for \(S=\mathbb{R}^{d}\)). We also provide a novel interpretation of IPMs and their empirical estimators by relating them to the problem of binary classification: while the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier, the smoothness of an appropriate binary classifier (e.g., support vector machine, Lipschitz classifier, etc.) is inversely related to the empirical estimator of the IPM between these class-conditional distributions.


62G05 Nonparametric estimation
60B05 Probability measures on topological spaces


