×

Principal component analysis for second-order stationary vector time series. (English) Zbl 1454.62255

Summary: We extend the principal component analysis (PCA) to second-order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a \(p\)-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When \(p\) is large, an additional step is required to perform a permutation in terms of either maximum cross-correlations or FDR based on multiple tests. The asymptotic theory is established for both fixed \(p\) and diverging \(p\) when the sample size \(n\) tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H25 Factor analysis and principal components; correspondence analysis

Software:

FastICA; itsmr

References:

[1] Anderson, T. W. (1963). The use of factor analysis in the statistical analysis of multiple time series. Psychometrika28 1–25. · Zbl 0209.20402 · doi:10.1007/BF02289543
[2] Back, A. D. and Weigend, A. S. (1997). A first application of independent component analysis to extracting structure from stock returns. Int. J. Neural Syst.8 473–484.
[3] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica70 191–221. · Zbl 1103.91399 · doi:10.1111/1468-0262.00273
[4] Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F. and Moulines, E. (1997). A blind source separation technique using second-order statistics. IEEE Trans. Signal Process.45 434–444.
[5] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist.36 2577–2604. · Zbl 1196.62062 · doi:10.1214/08-AOS600
[6] Box, G. E. P. and Jenkins, G. M. (1970). Times Series Analysis. Forecasting and Control. Holden-Day, San Francisco, CA–London–Amsterdam. · Zbl 0249.62009
[7] Box, G. E. P. and Tiao, G. C. (1977). A canonical analysis of multiple time series. Biometrika64 355–365. · Zbl 0362.62091 · doi:10.1093/biomet/64.2.355
[8] Brillinger, D. R. (1975). Time Series: Data Analysis and Theory. Holt, Rinehart and Winston, Inc., New York–Montreal, QC–London. · Zbl 0321.62004
[9] Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York. · Zbl 0868.62067
[10] Cardoso, J. (1998). Multidimensional independent component analysis. In Proceedings of the 1998 IEEE Int. Conf. Acoustics, Speech and Signal Processing4 1941–1944.
[11] Chang, J., Guo, B. and Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. J. Econometrics189 297–312. · Zbl 1337.62247 · doi:10.1016/j.jeconom.2015.03.024
[12] Chang, J., Guo, B. and Yao, Q. (2018). Supplement to “Principal component analysis for second-order stationary vector time series.” DOI:10.1214/17-AOS1613SUPP.
[13] Chang, J., Yao, Q. and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika104 111–127. · Zbl 1506.62307
[14] Davis, R. A., Zang, P. and Zheng, T. (2016). Sparse vector autoregressive modeling. J. Comput. Graph. Statist.25 1077–1096.
[15] Fan, J., Wang, M. and Yao, Q. (2008). Modelling multivariate volatilities via conditionally uncorrelated components. J. R. Stat. Soc. Ser. B. Stat. Methodol.70 679–702. · Zbl 05563364 · doi:10.1111/j.1467-9868.2008.00654.x
[16] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. · Zbl 1014.62103
[17] Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2005). The generalized dynamic factor model: One-sided estimation and forecasting. J. Amer. Statist. Assoc.100 830–840. · Zbl 1117.62334 · doi:10.1198/016214504000002050
[18] Guo, S., Wang, Y. and Yao, Q. (2016). High-dimensional and banded vector autoregressions. Biometrika103 889–903.
[19] Han, F., Lu, H. and Liu, H. (2015). A direct estimation of high dimensional stationary vector autoregressions. J. Mach. Learn. Res.16 3115–3150. · Zbl 1351.62165
[20] Hyvärinen, A., Karhunen, J. and Oja, E. (2001). Independent Component Analysis. Wiley, New York.
[21] Jakeman, A. J., Steele, L. P. and Young, P. C. (1980). Instrumental variable algorithms for multiple input systems described by multiple transfer functions. IEEE Trans. Syst. Man Cybern. Syst.10 593–602. · Zbl 0458.93049 · doi:10.1109/TSMC.1980.4308363
[22] Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist.40 694–726. · Zbl 1273.62214 · doi:10.1214/12-AOS970
[23] Lam, C., Yao, Q. and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika98 901–918. · Zbl 1228.62110 · doi:10.1093/biomet/asr048
[24] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal.88 365–411. · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4
[25] Liu, W., Xiao, H. and Wu, W. B. (2013). Probability and moment inequalities under dependence. Statist. Sinica23 1257–1272. · Zbl 1534.60027
[26] Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, Berlin. · Zbl 1072.62075
[27] Matteson, D. S. and Tsay, R. S. (2011). Dynamic orthogonal components for multivariate time series. J. Amer. Statist. Assoc.106 1450–1463. · Zbl 1323.62086 · doi:10.1198/jasa.2011.tm10616
[28] Pan, J. and Yao, Q. (2008). Modelling multiple time series via common factors. Biometrika95 365–379. · Zbl 1437.62574 · doi:10.1093/biomet/asn009
[29] Paparoditis, E. and Politis, D. N. (2012). Nonlinear spectral density estimation: Thresholding the correlogram. J. Time Series Anal.33 386–397. · Zbl 1301.62041 · doi:10.1111/j.1467-9892.2011.00771.x
[30] Peña, D. and Box, G. E. P. (1987). Identifying a simplifying structure in time series. J. Amer. Statist. Assoc.82 836–843. · Zbl 0623.62081
[31] Reinsel, G. C. (1993). Elements of Multivariate Time Series Analysis. Springer, New York. · Zbl 0783.62072
[32] Rio, E. (2000). Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques & Applications (Berlin) [Mathematics & Applications] 31. Springer, Berlin.
[33] Sarkar, S. K. and Chang, C.-K. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc.92 1601–1608. · Zbl 0912.62079 · doi:10.1080/01621459.1997.10473682
[34] Shojaie, A. and Michailidis, G. (2010). Discovering graphical Granger causality using the truncated lasso penalty. Bioinformatics26 517–523.
[35] Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika73 751–754. · Zbl 0613.62067 · doi:10.1093/biomet/73.3.751
[36] Song, S. and Bickel, P. J. (2011). Large vector auto regressions. Available at arXiv:1106.3519.
[37] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA. · Zbl 0706.65013
[38] Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc.97 1167–1179. · Zbl 1041.62081 · doi:10.1198/016214502388618960
[39] Stock, J. H. and Watson, M. W. (2005). Implications of dynamic factor models for VAR analysis. Available at: www.nber.org/papers/w11467.
[40] Theis, F. J., Meyer-Baese, A. and Lang, E. W. (2004). Second-order blind source separation based on multi-dimensional autocovariances. In Independent Component Analysis and Blind Signal Separation (C. G. Puntonet and A. Prieto, eds.) 726—733. Springer, Berlin. · Zbl 0982.94004 · doi:10.1109/78.847796
[41] Tiao, G. C. and Tsay, R. S. (1989). Model specification in multivariate time series. J. Roy. Statist. Soc. Ser. B51 157–213. With discussion. · Zbl 0693.62071
[42] Tong, L., Xu, G. and Kailath, T. (1994). Blind identification and equalization based on second-order statistics: A time domain approach. IEEE Trans. Inform. Theory40 340–349.
[43] Tsay, R. · Zbl 1279.62012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.