×

Multiple change-points detection by empirical Bayesian information criteria and Gibbs sampling induced stochastic search. (English) Zbl 1481.62062

Summary: Uncovering hidden change-points in an observed signal sequence is challenging both mathematically and computationally. We tackle this by developing an innovative methodology based on Markov chain Monte Carlo and statistical information theory. It consists of an empirical Bayesian information criterion (emBIC) to assess the fitness and virtue of candidate configurations of change-points, and a stochastic search algorithm induced from Gibbs sampling to find the optimal change-points configuration. Our emBIC is derived by treating the unknown change-point locations as latent data rather than parameters as is in traditional BIC, resulting in significant improvement over the latter which is known to mostly over-detect change-points. The use of the Gibbs sampler induced search enables one to quickly find the optimal change-points configuration with high probability and without going through computationally infeasible enumeration. We also integrate the Gibbs sampler induced search with a current BIC-based change-points sequential testing method, significantly improving the method’s performance and computing feasibility. We further develop two comprehensive 3-step computing procedures to implement the proposed methodology for practical use. Finally, simulation studies and real examples analyzing business and genetic data are presented to illustrate and assess the procedures.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
60J22 Computational methods in Markov chains
62C10 Bayesian problems; characterization of Bayes procedures
62C12 Empirical decision procedures; empirical Bayes procedures
65C40 Numerical analysis or methods applied to Markov chains

Software:

DNAcopy
Full Text: DOI

References:

[1] Bai, J.; Perron, P., Estimating and testing linear models with multiple structural changes, Econometrica, 66, 1, 47-78 (1998) · Zbl 1056.62523
[2] Page, E. S., A test for a change in a parameter occurring at an unknown point, Biometrika, 42, 523-527 (1955) · Zbl 0067.11602
[3] Hawkins, D. M., Testing a sequence of observations for a shift in location’, J. Am. Stat. Assoc., 72, 180-186 (1977) · Zbl 0346.62027
[4] Wichern, D. W.; Miller, R. B.; Hsu, D. A., Changes of variance in first-order autoregressive time series – with an application, Appl. Stat., 25, 248-256 (1976)
[5] Wu, Y., Simultaneous change point analysis and variable selection in a regression problem, J. Multivar. Anal., 99, 2154-2171 (2008) · Zbl 1169.62064
[6] Vostrikova, L. J., Detection of ‘disorder’ in multidimensional random processes, Sov. Math. Dokl., 24, 55-59 (1981) · Zbl 0487.62072
[7] Bai, J.; Perron, P., Computation and analysis of multiple structural change models, J. Appl. Econ., 18, 1-22 (2003)
[8] Yao, Y., Estimating the number of change points via Schwarz criterion, Stat. Prob. Lett., 6, 3, 181-189 (1988) · Zbl 0642.62016
[9] Chen, J.; Gupta, A. K., Testing and locating variance change points with application to stock price, J. Am. Stat. Assoc., 92, 739-747 (1997) · Zbl 1090.62565
[10] Davis, R. A.; Lee, T. C.M.; Rodriguez-Yam, G. A.R., Structural break estimation for nonstationary time series models, J. Am. Stat. Assoc., 101, 223-239 (2006) · Zbl 1118.62359
[11] Kurozumi, E.; Tuvaandorj, P., Model selection criteria in multivariate models with multiple structure changes, J. Econ., 164, 218-238 (2011) · Zbl 1441.62786
[12] Aue, A.; Horváth, L., Structural breaks in time series, J. Time Ser. Anal., 34, 1-16 (2013) · Zbl 1274.62553
[13] Kim, J.; Cheon, S., Bayesian multiple change-point estimation with annealing stochastic approximation monte carlo, Comput. Stat., 25, 215-239 (2010) · Zbl 1221.62048
[14] Barry, D.; Hartigan, J. A., A Bayesian analysis for change point problems, J. Am. Stat. Assoc., 88, 309-319 (1993) · Zbl 0775.62065
[15] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 461-464 (1978) · Zbl 0379.62005
[16] Hannan, E. J.; Quinn, B. G., The determination of the order of an autoregression, J. R. Stat. Soc. B, 40, 190-195 (1979) · Zbl 0408.62076
[17] Lavielle, M., Using penalized contrasts for the change-point problem, Signal Process., 85, 1501-1510 (2005) · Zbl 1160.94341
[18] Lavielle, M.; Teyssiere, G., Detection of multiple change points in multivariate time series, Lith. Math. J., 46, 3 (2006) · Zbl 1138.62051
[19] Delyon, B.; Lavielle, M.; Moulines, E., Convergence of a stochastic approximation version of the EM algorithm, Ann. Stat., 27, 1, 94-128 (1999) · Zbl 0932.62094
[20] Birge, L.; Massart, P., Gaussian model selection, J. Eur. Math. Soc., 3, 203-268 (2001) · Zbl 1037.62001
[21] Csörgo, M.; Horváth, L., Limit Theorems in Change-point Analysis (1997), Wiley · Zbl 0884.62023
[22] Casella, G.; George, E. I., Explaining the Gibbs sampler, Am. Stat., 46, 167-174 (1992)
[23] Qian, G., Computations and analysis in robust regression model selection using stochastic complexity, Comput. Stat., 14, 293-314 (1999) · Zbl 0940.62063
[24] Qian, G.; Field, C., Using MCMC for logistic regression model selection involving large number of candidate models, (Fang, K. T.; etal., Proceedings of the Fourth International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (2002), Springer: Springer Hong Kong), 460-474 · Zbl 0994.62064
[25] Qian, G.; Zhao, X., Using Gibbs sampler for time series model selection involving many ARMA models, Comput. Stat. Data Anal., 51, 6180-6196 (2007) · Zbl 1445.62237
[26] Qian, G.; Rao, C. R.; Sun, X.; Wu, Y., Boosting association rule mining in large datasets via Gibbs sampling, Proc. Natl. Acad. Sci. USA, 113, 4958-4963 (2016)
[27] Lavielle, M.; Lebarbier, E., An application of MCMC methods to the multiple change points problem, Signal Process., 81, 39-53 (2001) · Zbl 1098.94557
[28] Robert, C. P.; Richardson, S., MCMC methods, (Robert, C. P.E., Discretization and MCMC Convergence Assessment, Lecture Notes in Statistics, volume 135 (1998), Springer: Springer New York), 1-25 · Zbl 0961.62085
[29] Qian, G.; Shi, X.; Wu, Y., A statistical test of change-point in mean that almost surely has zero error probabilities, Aust. N. Z. J. Stat., 55, 4, 435-454 (2014) · Zbl 1336.62065
[30] Box, G. E.P.; Jenkins, G., Time Series Analysis: Forecasting and Control (1976), Holden-Day: Holden-Day San Francisco · Zbl 0363.62069
[31] Inclan, C.; Tiao, G. C., Use of cumulative sums of squares for retrospective detection of variance changes, J. Am. Stat. Assoc., 89, 913-923 (1994) · Zbl 0825.62678
[32] Baufays, P.; Rasson, J., Variance changes in autoregressive models, (Anderson, O. D., Time Series Analysis: Theory & Practice 7 (1985)), 119-127 · Zbl 0561.62077
[33] Tsay, R. S., Outilers, level shifts and variance changes in time series, J. Forecast., 7, 1-20 (1998)
[34] Olshen, A. B.; Venkatraman, E. S.; Lucito, R.; Wigler, M., Circular binary segmentation for analysing array-based dna copy numbers, Biostatistics, 5, 557-572 (2004) · Zbl 1155.62478
[35] Seshan, V. E.; Olshen, A., DNAcopy: DNA copy number analysis, R package (2011)
[36] Snijders, A. M.; Nowak, N.; Segraves, R., Assembly of microarrays for genome-wide measurement of DNA copy number, Nat. Genet., 29, 263-264 (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.