×

Sparse group fused Lasso for model segmentation: a hybrid approach. (English) Zbl 07433033

Summary: This article introduces the sparse group fused lasso (SGFL) as a statistical framework for segmenting sparse regression models with multivariate time series. To compute solutions of the SGFL, a nonsmooth and nonseparable convex program, we develop a hybrid optimization method that is fast, requires no tuning parameter selection, and is guaranteed to converge to a global minimizer. In numerical experiments, the hybrid method compares favorably to state-of-the-art techniques with respect to computation time and numerical accuracy; benefits are particularly substantial in high dimension. The method’s statistical performance is satisfactory in recovering nonzero regression coefficients and excellent in change point detection. An application to air quality data is presented. The hybrid method is implemented in the R package sparseGFL available on the author’s Github page.

MSC:

37M10 Time series analysis of dynamical systems
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
65K10 Numerical optimization and variational techniques

References:

[1] Alaíz, CM; Jiménez, ÁB; Dorronsoro, JR, Group fused lasso, Artif Neural Netw Mach Learn, 2013, 66-73 (2013)
[2] Alewijnse, SPA; Buchin, K.; Buchin, M.; Sijben, S.; Westenberg, MA, Model-based segmentation and classification of trajectories, Algorithmica, 80, 8, 2422-2452 (2018) · Zbl 1392.68423 · doi:10.1007/s00453-017-0329-x
[3] Bai, J., Estimating multiple breaks one at a time, Econom Theory, 13, 3, 315-352 (1997) · doi:10.1017/S0266466600005831
[4] Bai, J.; Perron, P., Computation and analysis of multiple structural change models, J Appl Econom, 18, 1, 1-22 (2003) · doi:10.1002/jae.659
[5] Barbero A, Sra S (2011) Fast Newton-type methods for total variation regularization. In: Proceedings of the 28th international conference on machine learning, ICML 2011, pp 313-320
[6] Basseville, M.; Nikiforov, IV, Detection of abrupt changes: theory and application. Prentice Hall information and system sciences series (1993), Englewood Cliffs: Prentice Hall Inc, Englewood Cliffs · Zbl 1407.62012
[7] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J Imaging Sci, 2, 1, 183-202 (2009) · Zbl 1175.94009 · doi:10.1137/080716542
[8] Becker, S.; Bobin, J.; Candès, EJ, NESTA: a fast and accurate first-order method for sparse recovery, SIAM J Imaging Sci, 4, 1, 1-39 (2011) · Zbl 1209.90265 · doi:10.1137/090756855
[9] Beer, JC; Aizenstein, HJ; Anderson, SJ; Krafty, RT, Incorporating prior information with fused sparse group lasso: application to prediction of clinical measures from neuroimages, Biometrics, 75, 4, 1299-1309 (2019) · Zbl 1448.62151 · doi:10.1111/biom.13075
[10] Bertsekas, DP, Convex optimization algorithms (2015), Belmont: Athena Scientific, Belmont · Zbl 1347.90001
[11] Bleakley K, Vert JP (2011) The group fused lasso for multiple change-point detection. Technical Report hal-00602121. https://hal.archives-ouvertes.fr/hal-00602121. Accessed 15 Oct 2020
[12] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Found Trends Mach Learn, 3, 1, 1-122 (2011) · Zbl 1229.90122 · doi:10.1561/2200000016
[13] Bredies, K.; Lorenz, DA, Linear convergence of iterative soft-thresholding, J Fourier Anal Appl, 14, 5-6, 813-837 (2008) · Zbl 1175.65061 · doi:10.1007/s00041-008-9041-1
[14] Cao, P.; Liu, X.; Liu, H.; Yang, J.; Zhao, D.; Huang, M.; Zaiane, O., Generalized fused group lasso regularized multi-task feature learning for predicting cognitive outcomes in Alzheimers disease, Comput Methods Programs Biomed, 162, 19-45 (2018) · doi:10.1016/j.cmpb.2018.04.028
[15] Chen, J.; Chen, Z., Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95, 3, 759-771 (2008) · Zbl 1437.62415 · doi:10.1093/biomet/asn034
[16] Chen, X.; Lin, Q.; Kim, S.; Carbonell, JG; Xing, EP, Smoothing proximal gradient method for general structured sparse regression, Ann Appl Stat, 6, 2, 719-752 (2012) · Zbl 1243.62100 · doi:10.1214/11-AOAS514
[17] Chi, EC; Lange, K., Splitting methods for convex clustering, J Comput Graph Stat, 24, 4, 994-1013 (2015) · doi:10.1080/10618600.2014.948181
[18] Combettes, PL; Pesquet, JC, Fixed-point algorithms for inverse problems in science and engineering, chap. proximal splitting methods in signal processing, 185-212 (2011), New York: Springer, New York · Zbl 1242.90160
[19] Condat, L., A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms, J Optim Theory Appl, 158, 2, 460-479 (2013) · Zbl 1272.90110 · doi:10.1007/s10957-012-0245-9
[20] De Vito, S.; Massera, E.; Piga, M.; Martinotto, L.; Di Francia, G., On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sens Actuators B Chem, 129, 2, 750-757 (2008) · doi:10.1016/j.snb.2007.09.060
[21] De Vito, S.; Piga, M.; Martinotto, L.; Di Francia, G., \(Co,No_2\) and \(No_x\) urban pollution monitoring with on-field calibrated electronic nose by automatic Bayesian regularization, Sens Actuators B Chem, 143, 1, 182-191 (2009) · doi:10.1016/j.snb.2009.08.041
[22] Friedman, J.; Hastie, T.; Höfling, H.; Tibshirani, R., Pathwise coordinate optimization, Ann Appl Stat, 1, 2, 302-332 (2007) · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[23] Fryzlewicz, P., Wild binary segmentation for multiple change-point detection, Ann Stat, 42, 6, 2243 (2014) · Zbl 1302.62075 · doi:10.1214/14-AOS1245
[24] Hadj-Selem, F.; Löfstedt, T.; Dohmatob, E.; Frouin, V.; Dubois, M.; Guillemot, V.; Duchesnay, E., Continuation of Nesterov’s smoothing for regression with structured sparsity in high-dimensional neuroimaging, IEEE Trans Med Imaging, 37, 11, 2403-2413 (2018) · doi:10.1109/TMI.2018.2829802
[25] Hallac, D.; Nystrup, P.; Boyd, S., Greedy Gaussian segmentation of multivariate time series, Adv Data Anal Classif, 13, 3, 727-751 (2019) · Zbl 1459.62170 · doi:10.1007/s11634-018-0335-0
[26] Hocking T, Vert JP, Bach FR, Joulin A (2011) Clusterpath: an algorithm for clustering using convex fusion penalties. In: ICML
[27] Hoefling, H., A path algorithm for the fused lasso signal approximator, J Comput Graph Stat, 19, 4, 984-1006 (2010) · doi:10.1198/jcgs.2010.09208
[28] Kim, S.; Xing, EP, Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping, Ann Appl Stat, 6, 3, 1095-1117 (2012) · Zbl 1254.62112 · doi:10.1214/12-AOAS549
[29] Kuhn, HW, A note on Fermat’s problem, Mat Program, 4, 98-107 (1973) · Zbl 0255.90063 · doi:10.1007/BF01584648
[30] Leonardi F, Bühlmann P (2016) Computationally efficient change point detection for high-dimensional regression
[31] Li, Y.; Osher, S., Coordinate descent optimization for \(\ell^1\) minimization with application to compressed sensing; a greedy algorithm, Inverse Probl Imaging, 3, 3, 487-503 (2009) · Zbl 1188.90196 · doi:10.3934/ipi.2009.3.487
[32] Li, X.; Mo, L.; Yuan, X.; Zhang, J., Linearized alternating direction method of multipliers for sparse group and fused LASSO models, Comput Stati Data Anal, 79, 203-221 (2014) · Zbl 1506.62112 · doi:10.1016/j.csda.2014.05.017
[33] Liu J, Yuan L, Ye J (2010) An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, pp 323-332
[34] Nesterov, Y., Smooth minimization of non-smooth functions, Math Program, 103, 1, 127-152 (2005) · Zbl 1079.90102 · doi:10.1007/s10107-004-0552-5
[35] Nesterov, Y., Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J Optim, 22, 2, 341-362 (2012) · Zbl 1257.90073 · doi:10.1137/100802001
[36] Nystrup, P.; Madsen, H.; Lindström, E., Long memory of financial time series and hidden Markov models with time-varying parameters, J Forecast, 36, 8, 989-1002 (2017) · Zbl 1397.60104 · doi:10.1002/for.2447
[37] Ohlsson, H.; Ljung, L.; Boyd, S., Segmentation of ARX-models using sum-of-norms regularization, Automatica, 46, 6, 1107-1111 (2010) · Zbl 1191.93139 · doi:10.1016/j.automatica.2010.03.013
[38] Ombao, H.; von Sachs, R.; Guo, W., Slex analysis of multivariate nonstationary time series, J Am Stat Assoc, 100, 470, 519-531 (2005) · Zbl 1117.62407 · doi:10.1198/016214504000001448
[39] Price, BS; Geyer, CJ; Rothman, AJ, Automatic response category combination in multinomial logistic regression, J Comput Graph Stat, 28, 3, 758-766 (2019) · Zbl 07499092 · doi:10.1080/10618600.2019.1585258
[40] R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed 15 Oct 2020
[41] Ranalli, M.; Lagona, F.; Picone, M.; Zambianchi, E., Segmentation of sea current fields by cylindrical hidden Markov models: a composite likelihood approach, J R Stat Soc Ser C (Appl Stat), 67, 3, 575-598 (2018) · doi:10.1111/rssc.12240
[42] Rockafellar, R., Convex analysis. Princeton landmarks in mathematics and physics (2015), Princeton: Princeton University Press, Princeton
[43] Sanderson, C.; Curtin, R., Armadillo: a template-based C++ library for linear algebra, J Open Source Softw, 1, 26 (2016) · doi:10.21105/joss.00026
[44] Saxén, JE; Saxén, H.; Toivonen, HT, Identification of switching linear systems using self-organizing models with application to silicon prediction in hot metal, Appl Soft Comput, 47, 271-280 (2016) · doi:10.1016/j.asoc.2016.05.048
[45] Shor NZ (1985) Minimization methods for nondifferentiable functions, Springer series in computational mathematics, vol 3. Springer, Berlin (Translated from the Russian by K. C. Kiwiel and A. Ruszczyński) · Zbl 0561.90058
[46] Songsiri J (2015) Learning multiple granger graphical models via group fused lasso. In: 2015 10th Asian control conference (ASCC), pp 1-6
[47] Tibshirani, R.; Wang, P., Spatial smoothing and hot spot detection for CGH data using the fused lasso, Biostatistics, 9, 1, 18-29 (2007) · Zbl 1274.62886 · doi:10.1093/biostatistics/kxm013
[48] Tibshirani, R.; Bien, J.; Friedman, J.; Hastie, T.; Simon, N.; Taylor, J.; Tibshirani, RJ, Strong rules for discarding predictors in lasso-type problems, J R Stat Soc Ser B Stat Methodol, 74, 2, 245-266 (2012) · Zbl 1411.62213 · doi:10.1111/j.1467-9868.2011.01004.x
[49] Truong C, Oudre L, Vayatis N (2018) A review of change point detection methods. arXiv:1801.00718 · Zbl 07160286
[50] Tseng, P., Convergence of a block coordinate descent method for nondifferentiable minimization, J Optim Theory Appl, 109, 3, 475-494 (2001) · Zbl 1006.65062 · doi:10.1023/A:1017501703105
[51] Vũ, BC, A variable metric extension of the forward-backward-forward algorithm for monotone operators, Numer Funct Anal Optim, 34, 9, 1050-1065 (2013) · Zbl 1279.47075 · doi:10.1080/01630563.2013.763825
[52] Wang, T.; Zhu, L., Consistent tuning parameter selection in high dimensional sparse linear regression, J Multivar Anal, 102, 7, 1141-1151 (2011) · Zbl 1216.62103 · doi:10.1016/j.jmva.2011.03.007
[53] Wang, J.; Fan, W.; Ye, J., Fused lasso screening rules via the monotonicity of subdifferentials, IEEE Trans Pattern Anal Mach Intell, 37, 9, 1806-1820 (2015) · doi:10.1109/TPAMI.2014.2388203
[54] Wang, J.; Wonka, P.; Ye, J., Lasso screening rules via dual polytope projection, J Mach Learn Res, 16, 1063-1101 (2015) · Zbl 1360.62403
[55] Wang, B.; Zhang, Y.; Sun, WW; Fang, Y., Sparse convex clustering, J Comput Graph Stat, 27, 2, 393-403 (2018) · Zbl 07498956 · doi:10.1080/10618600.2017.1377081
[56] Weiszfeld, E.; Plastria, F., On the point for which the sum of the distances to n given points is minimum, Ann Oper Res, 167, 1, 7-41 (2009) · Zbl 1176.90616 · doi:10.1007/s10479-008-0352-z
[57] Wytock, M.; Sra, S.; Kolter, JZ, Fast Newton methods for the group fused lasso, Uncertain Artif Intell, 2014, 888-897 (2014)
[58] Xu, Y.; Lindquist, M., Dynamic connectivity detection: an algorithm for determining functional connectivity change points in fMRI data, Front eurosci, 9, 285 (2015)
[59] Yan, M., A new primal-dual algorithm for minimizing the sum of three functions with a linear operator, J Sci Comput, 76, 3, 1698-1717 (2018) · Zbl 1415.65142 · doi:10.1007/s10915-018-0680-3
[60] Yao, YC, Estimating the number of change-points via Schwarz’ criterion, Stat Probab Lett, 6, 3, 181-189 (1988) · Zbl 0642.62016 · doi:10.1016/0167-7152(88)90118-6
[61] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J R Stat Soc Ser B Stat Methodol, 68, 1, 49-67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[62] Zhou, J.; Liu, J.; Narayan, VA; Ye, J., Modeling disease progression via multi-task learning, NeuroImage, 78, 233-248 (2013) · doi:10.1016/j.neuroimage.2013.03.073
[63] Zhu C, Xu H, Leng C, Yan S (2014) Convex optimization procedure for clustering: theoretical revisit. In: NIPS
[64] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J R Stat Soc Ser B Stat Methodol, 67, 2, 301-320 (2005) · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.