×

Double empirical Bayes testing. (English) Zbl 07778691

Summary: Analysing data from large-scale, multiexperiment studies requires scientists to both analyse each experiment and to assess the results as a whole. In this article, we develop double empirical Bayes testing (DEBT), an empirical Bayes method for analysing multiexperiment studies when many covariates are gathered per experiment. DEBT is a two-stage method: in the first stage, it reports which experiments yielded significant outcomes and in the second stage, it hypothesises which covariates drive the experimental significance. In both of its stages, DEBT builds on the work of Efron, who laid out an elegant empirical Bayes approach to testing. DEBT enhances this framework by learning a series of black box predictive models to boost power and control the false discovery rate. In Stage 1, it uses a deep neural network prior to report which experiments yielded significant outcomes. In Stage 2, it uses an empirical Bayes version of the knockoff filter to select covariates that have significant predictive power of Stage 1 significance. In both simulated and real data, DEBT increases the proportion of discovered significant outcomes and selects more features when signals are weak. In a real study of cancer cell lines, DEBT selects a robust set of biologically plausible genomic drivers of drug sensitivity and resistance in cancer.
{© 2020 International Statistical Institute}

MSC:

62Cxx Statistical decision theory
62Fxx Parametric inference
62Pxx Applications of statistics

References:

[1] Barber, R.F. & Candès, E.J.2015. Controlling the false discovery rate via knockoffs. Annals Stat., 43(5), 2055-2085. · Zbl 1327.62082
[2] Bates, S., Sesia, M., Sabatti, C. & Candes, E.2020. Causal inference in genetic trio studies. arXiv preprint arXiv:2002.09644. · Zbl 1485.92072
[3] Benjamini, Y. & Hochberg, Y.1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J.Royal Stat. Society: Ser. B (Stat. Methodology), 289-300. · Zbl 0809.62014
[4] Candes, E., Fan, Y., Janson, L. & Lv, J.2018. Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. J. Royal Stat. Society: Ser. B (Stat. Methodology). · Zbl 1398.62335
[5] Efron, B.2003. Robbins, empirical Bayes and microarrays. Annals Stat., 31(2), 366-378. · Zbl 1038.62099
[6] Efron, B.2004. Large‐scale simultaneous hypothesis testing: the choice of a null hypothesis. J. American Stat. Association, 99(465), 96-104. · Zbl 1089.62502
[7] Efron, B.2005. Local false discovery rates. Available at: http://statweb.stanford.edu/ ckirby/brad/papers/2005LocalFDR.pdf
[8] Efron, B.2008. Microarrays, empirical Bayes and the two‐groups model. Stat. Sci., 1-22. · Zbl 1327.62046
[9] Efron, B.2019. Bayes, oracle Bayes and empirical Bayes. Stat. Sci., 34(2), 177-201. · Zbl 1420.62023
[10] Efron, B. & Tibshirani, R.1996. Using specially designed exponential families for density estimation. Annals Stat., 24(6), 2431-2461. · Zbl 0878.62028
[11] Efron, B., Tibshirani, R., Storey, J.D. & Tusher, V.2001. Empirical Bayes analysis of a microarray experiment. J. Amer. Stat. Assoc., 96(456), 1151-1160. · Zbl 1073.62511
[12] Fernández‐Majada, V., Welz, P.‐S., Ermolaeva, M.A., Schell, M., Adam, A., Dietlein, F., Komander, D., Büttner, R., Thomas, R.K. & Schumacher, B.2016. The tumour suppressor CYLD regulates the p53 DNA damage response. Nat. Commun., 7(1), 1-14.
[13] Freedman, D.A.1983. A note on screening regression equations. Amer. Stat., 37(2), 152-155.
[14] Garnett, M.J., Edelman, E.J., Heidorn, S.J. & other2012. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 483(7391), 570.
[15] Iorio, F., Knijnenburg, T.A., Vis, D.J., Bignell, G.R., Menden, M.P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S. & Lightfoot, H.2016. A landscape of pharmacogenomic interactions in cancer. Cell, 166(3), 740-754.
[16] Katsevich, E. & Ramdas, A.2020a. The leave‐one‐covariate‐out conditional randomization test. arXiv preprint arXiv:2006.08482.
[17] Katsevich, E. & Ramdas, A.2020b. A theoretical treatment of conditional independence testing under Model‐X. arXiv preprint arXiv:2005.05506.
[18] Laurie, N.A., Donovan, S.L., Shih, C.‐S. & other2006. Inactivation of the p53 pathway in retinoblastoma. Nature, 444(7115), 61-66.
[19] Lei, L. & Fithian, W.2018. AdaPT: an interactive procedure for multiple testing with side information. J. Royal Stat. Society. · Zbl 1398.62049
[20] Li, A. & Barber, R.F.2017. Accumulation tests for FDR control in ordered hypothesis testing. J. Amer. Stat. Assoc., 112(518), 837-849.
[21] Liu, M., Katsevich, E., Janson, L. & Ramdas, A.2020. Fast and powerful conditional randomization testing via distillation. arXiv preprint arXiv:2006.08482.
[22] Liu, Y. & Zheng, C.2018. Auto‐encoding knockoff generator for FDR controlled variable selection. arXiv preprint arXiv:1809.10765.
[23] Mao, J.‐H., Kim, I.‐J., Wu, D. & other2008. FBXW7 targets mTOR for degradation and cooperates with PTEN in tumor suppression. Science, 321(5895), 1499-1502.
[24] Mazoure, B., Nadon, R. & Makarenkov, V.2017. Identification and correction of spatial bias are essential for obtaining quality data in high‐throughput screening technologies. Sci. Reports, 7(1), 11921.
[25] Newton, M.A.2002. A nonparametric recursive estimator of the mixing distribution. Sankhya Ser. A, 64, 306-22. · Zbl 1192.62110
[26] Puszynski, K., Gandolfi, A. & d’Onofrio, A.2014. The pharmacodynamics of the p53‐MDM2 targeting drug Nutlin: the role of gene‐switching noise. PLoS Comput. Biol., 10(12), e1003991.
[27] Ramdas, A., Barber, R.F., Wainwright, M.J. & Jordan, M.I.2017. A unified treatment of multiple testing with prior knowledge. arXiv preprint arXiv:1703.06222.
[28] Scott, J.G., Kelly, R.C., Smith, M.A., Zhou, P. & Kass, R.E.2015. False discovery rate regression: an application to neural synchrony detection in primary visual cortex. J. Amer. Stat. Assoc., 110(510), 459-471.
[29] Tansey, W., Koyejo, O., Poldrack, R.A. & Scott, J.G.2017. False discovery rate smoothing. J. Amer. Stat. Assoc. · Zbl 1402.62011
[30] Tansey, W., Veitch, V., Zhang, H., Rabadan, R. & Blei, D.M.2018. The holdout randomization test: principled and easy black box feature selection. arXiv preprint arXiv:1811.00645.
[31] Tansey, W., Wang, Y., Blei, D. & Rabadan, R.2018. Black box FDR. In International conference on machine learning, pp. 4874-4883.
[32] Tokdar, S., Martin, R. & Ghosh, J.K.2009. Consistency of a recursive estimate of mixing distributions. Annals. Stat., 37(5A), 2502-22. · Zbl 1173.62020
[33] Walter, R.F.H., Mairinger, F.D., Ting, S. et al. 2015. MDM2 is an important prognostic and predictive factor for platin-pemetrexed therapy in malignant pleural mesotheliomas and deregulation of P14/ARF (encoded by CDKN2A) seems to contribute to an MDM2‐driven inactivation of P53. British J. Cancer, 112(5), 883-890.
[34] Weinberg, R.2013. The Biology of Cancer. Garland science.
[35] Wu, Y., Boos, D.D. & Stefanski, L.A.2007. Controlling variable selection by the addition of pseudovariables. J. Amer. Stat. Assoc., 102(477), 235-243. · Zbl 1284.62242
[36] Xia, F., Zhang, M.J., Zou, J.Y. & Tse, D.2017. NeuralFDR: learning discovery thresholds from hypothesis features. In Advances in neural information processing systems, pp. 1540-1549.
[37] Yang, W., Soares, J., Greninger, P. & other2012. Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res., 41(D1), D955-D961.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.