Abstract
In certain settings, such as microarray data, the sampling information is formed by a large number of possibly dependent small data sets. In special applications, for example in order to perform clustering, the researcher aims to verify whether all data sets have a common distribution. For this reason we propose a formal test for the null hypothesis that all data sets come from a single distribution. The asymptotic setting is that in which the number of small data sets goes to infinity, while the sample size remains fixed. The asymptotic null distribution of the proposed test is derived under mixing conditions on the sequence of small data sets, and the power properties of our test under two reasonable fixed alternatives are investigated. A simulation study is conducted, showing that the test respects the nominal level, and that it has a power which tends to 1 when the number of data sets tends to infinity. An illustration involving microarray data is provided.
Similar content being viewed by others
References
Bücher A, Kojadinovic I (2016a) A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing. Bernoulli 22:927–968
Bücher A, Kojadinovic I (2016b) Dependent multiplier bootstrap for non-degenerate \(U\)-statistics under mixing conditions with applications. J Stat Plan Inference 170:83–105
Bühlmann P (1993) The blockwise bootstrap in time series and empirical processes (Ph.D. thesis), ETH Zürich, Diss. ETH No. 10354
Cousido-Rocha M, de Uña-Álvarez J, Hart J (2018) Equalden.HD: testing the equality of a high dimensional set of densities. R package version 1.0. CRAN package repository: https://cran.r-project.org/web/packages/Equalden.HD/index.html
Dehling H, Wendler M (2010) Central limit theorem and the bootstrap for \(U\)-statistics of strongly mixing data. J Multivar Anal 101:126–137
Dehling H, Fried R, Garcia I, Wendler M (2015) Change-point detection under dependence based on two-sample \(U\)-statistics. Asymptotic laws and method in stochastics, a volume in Honour of Miklos Csrg, pp 195–220
Dey-Rao R, Sinha AA (2017) Genome-wide gene expression dataset used to identify potential therapeutic targets in androgenetic alopecia. Data Brief 13:85–87
Doukhan P (1995) Mixing: properties and examples. Springer, New York
Fan J, Yao Q (2003) Non linear time series: nonparametric and parametric methods. Springer, New York
Hahn M (2006) Proceedings of the SMBE Tri-National Young Investigators’ Workshop 2005. Accurate inference and estimation in population genomics. Mol Biol Evol 23:911–8
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi O, Wilfond B, Borg A, Trent J, Raffeld M, Yakhini Z, BenDor A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger G, Loman N, Johannsson O, Olsson H, Sauter G (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548
Koren A, Tirosh I, Barkai N (2007) Autocorrelation analysis reveals widespread spatial biases in microarray experiments. BMC Genomics 8:164
Künsch HR (1989) The jackknife and the bootstrap for general stationary observations. Ann Stat 17(3):1217–1241
Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: Lepage R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York
Marmer V (2016) Lecture notes on econometric theory II: Lecture 7, adapted from Peter Phillips’ lecture notes on stationarity and NSTS, 1995, and H. White, 1999, asymptotic theory for econometricians, Academic Press. UBC Vancouver School of Economics, Econ627. http://faculty.arts.ubc.ca/vmarmer/econ627/627_07_2.pdf
Neumann MH, Paparoditis E (2000) On bootstrapping \(L_2\)-type statistics in density testing. Stat Probab Lett 50:137–147
Priestley MB (1981) Spectral analysis and time series. Academic Press, New York
Politis DN (2002) Adaptive bandwidth choice. https://pdfs.semanticscholar.org/c8d5/4df33343c6550HrB85f867e82a1861e9d510dcd.pdfHrB. Accessed 13 Feb 2017
Politis DN, Romano JP (1994) Bias-corrected nonparametric spectral estimation II. Technical Report #94-5
Quessy JF, Éthier F (2012) Cramér–von Mises and characteristic function tests for the two and \(k\)-sample problems with dependent data. Comput Stat Data Anal 56:2097–2111
van der Vaart AW, Wellner JA (2000) Weak convergence and empirical processes, 2nd edn. Springer, New York
Zhan D, Hart J (2014) Testing equality of a large number of densities. Biometrika 101:449–464
Acknowledgements
This work has received financial support of the Call 2015 Grants for Ph.D. contracts for training of doctors of the Ministry of Economy and Competitiveness, cofinanced by the European Social Fund (Ref. BES-2015-074958). We acknowledge support from MTM2014-55966-P project, Ministry of Economy and Competitiveness, and MTM2017-89422-P project, Ministry of Economy, Industry and Competitiveness, State Research Agency, and Regional Development Fund, UE. We also acknowledge the financial support provided by the SiDOR research group through the grant Competitive Reference Group, 2016–2019 (ED431C 2016/040), funded by the “Consellería de Cultura, Educación e Ordenación Universitaria. Xunta de Galicia.” To finish, the first author would like to thank the University of Vigo, and its Escola Internacional de Doutoramento (EIDO) by the financial support provided through mobility doctorate grants. The authors also thank Professors Raymond J. Carroll and Robert Chapkin for allowing use of their data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Materials:
Supplementary Material includes formal definitions of mixing dependence, stationarity and regularity conditions needed for the technical results, a remark about Theorem 5, the proof of Theorem 6, an additional real data analysis, and additional simulation results. (pdf 394KB)
Rights and permissions
About this article
Cite this article
Cousido-Rocha, M., de Uña-Álvarez, J. & Hart, J.D. Testing equality of a large number of densities under mixing conditions. TEST 28, 1203–1228 (2019). https://doi.org/10.1007/s11749-018-00625-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-018-00625-3