×

The discriminative functional mixture model for a comparative analysis of bike sharing systems. (English) Zbl 1397.62511

Summary: Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work was motivated by interest in analyzing and comparing several European BSSs to identify common operating patterns in BSSs and to propose practical solutions to avoid potential issues. Our approach relies on the identification of common patterns between and within systems. To this end, a model-based clustering method, called FunFEM, for time series (or more generally functional data) is developed. It is based on a functional mixture model that allows the clustering of the data in a discriminative functional subspace. This model presents the advantage in this context to be parsimonious and to allow the visualization of the clustered systems. Numerical experiments confirm the good behavior of FunFEM, particularly compared to state-of-the-art methods. The application of FunFEM to BSS data from JCDecaux and the Transport for London Initiative allows us to identify 10 general patterns, including pathological ones, and to propose practical improvement strategies based on the system comparison. The visualization of the clustered data within the discriminative subspace turns out to be particularly informative regarding the system efficiency. The proposed methodology is implemented in a package for the R software, named funFEM, which is available on the CRAN. The package also provides a subset of the data analyzed in this work.

MSC:

62P20 Applications of statistics to economics
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716-723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Stat. Comput. 22 455-470. · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[3] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[4] Borgnat, P., Robardet, C., Rouquier, J. B., Parice, A., Fleury, E. and Flandrin, P. (2011). Shared bicycles in a city: A signal processing and data analysis perspective. Adv. Complex Syst. 14 1-24.
[5] Bouveyron, C. and Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22 301-324. · Zbl 1322.62162 · doi:10.1007/s11222-011-9249-9
[6] Bouveyron, C. and Brunet, C. (2014). Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput. Statist. 29 489-513. · Zbl 1306.65033 · doi:10.1007/s00180-013-0433-6
[7] Bouveyron, C., Girard, S. and Schmid, C. (2007). High-dimensional data clustering. Comput. Statist. Data Anal. 52 502-519. · Zbl 1452.62433
[8] Bouveyron, C. and Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Adv. Data Anal. Classif. 5 281-300. · Zbl 1274.62416 · doi:10.1007/s11634-011-0095-6
[9] Cadima, J. and Jolliffe, I. T. (1995). Loadings and correlations in the interpretation of principal components. J. Appl. Stat. 22 203-214. · doi:10.1080/757584614
[10] Côme, E. and Oukhellou, L. (2014). Model-based count series clustering for bike-sharing system usage mining, a case study with the Vélib system of Paris. Transportation Research-Part C Emerging Technologies 22 88.
[11] Dell’Olio, L., Ibeas, A. and Moura, J. L. (2011). Implementing bike-sharing systems. In ICE-Municipal Engineer 164 89-101. ICE publishing, London.
[12] Duda, R. O., Hart, P. E. and Stork, D. G. (2001). Pattern Classification , 2nd ed. Wiley, New York. · Zbl 0968.68140
[13] Escabias, M., Aguilera, A. M. and Valderrama, M. J. (2005). Modeling environmental data by functional principal component logistic regression. Environmetrics 16 95-107. · doi:10.1002/env.696
[14] Ferraty, F. and Vieu, P. (2003). Curves discrimination: A nonparametric functional approach. Comput. Statist. Data Anal. 44 161-173. · Zbl 1429.62241
[15] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 179-188.
[16] Fraley, C. and Raftery, A. (1999). MCLUST: Software for model-based cluster analysis. J. Classification 16 297-306. · Zbl 0951.91500 · doi:10.1007/s003579900058
[17] Froehlich, J., Neumann, J. and Oliver, N. (2008). Measuring the pulse of the city through shared bicycle programs. In International Workshop on Urban , Community , and Social Applications of Networked Sensing Systems. UrbanSense 08 16-20. Raleigh, NC.
[18] Froehlich, J., Neumann, J. and Oliver, N. (2009). Sensing and predicting the pulse of the city through shared bicycling. In 21 st International Joint Conference on Artificial Intelligence , IJCAI’ 09 1420-1426. AAAI Press, Menlo Park, CA.
[19] Frühwirth-Schnatter, S. and Kaufmann, S. (2008). Model-based clustering of multiple time series. J. Bus. Econom. Statist. 26 78-89.
[20] Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition , 2nd ed. Academic Press, Boston, MA. · Zbl 0711.62052
[21] Giacofci, M., Lambert-Lacroix, S., Marot, G. and Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 69 31-40. · Zbl 1274.62774 · doi:10.1111/j.1541-0420.2012.01828.x
[22] Heard, N. A., Holmes, C. C. and Stephens, D. A. (2006). A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J. Amer. Statist. Assoc. 101 18-29. · Zbl 1118.62368 · doi:10.1198/016214505000000187
[23] Ieva, F., Paganoni, A. M., Pigoli, D. and Vitelli, V. (2013). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J. R. Stat. Soc. Ser. C. Appl. Stat. 62 401-418. · doi:10.1111/j.1467-9876.2012.01062.x
[24] Jacques, J. and Preda, C. (2013). Funclust: A curves clustering method using functional random variable density approximation. Neurocomputing 112 164-171.
[25] Jacques, J. and Preda, C. (2014). Model-based clustering for multivariate functional data. Comput. Statist. Data Anal. 71 92-106. · Zbl 1471.62096
[26] James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397-408. · Zbl 1041.62052 · doi:10.1198/016214503000189
[27] Kahle, D. and Wickham, H. (2013). ggmap: Spatial visualization with ggplot2. The R Journal 5 144-161.
[28] Lathia, N., Saniul, A. and Capra, L. (2012). Measuring the impact of opening the London shared bicycle scheme to casual users. Transportation Research Part C : Emerging Technologies 22 88-102.
[29] Lévéder, C., Abraham, P. A., Cornillon, E., Matzner-Lober, E. and Molinari, N. (2004). Discrimination de courbes de prÈtrissage. In ChimiomÈtrie 2004 37-43.
[30] Lin, J. R. and Yang, T. (2011). Strategic design of public bicycle sharing systems with service level constraints. Transportation Research Part E : Logistics and Transportation Review 47 284-294.
[31] Lindsay, B. G. (1995). Mixture Models : Theory , Geometry and Applications . IMS, Hayward, CA. · Zbl 1163.62326
[32] Olszewski, R. T. (2001). Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.
[33] Preda, C. (2007). Regression models for functional data by reproducing kernel Hilbert spaces methods. J. Statist. Plann. Inference 137 829-840. · Zbl 1104.62043 · doi:10.1016/j.jspi.2006.06.011
[34] Preda, C., Saporta, G. and Lévéder, C. (2007). PLS classification of functional data. Comput. Statist. 22 223-235. · Zbl 1196.62086 · doi:10.1007/s00180-007-0041-4
[35] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. · Zbl 1079.62006 · doi:10.1007/b98888
[36] Ray, S. and Lindsay, B. G. (2008). Model selection in high dimensions: A quadratic-risk-based approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 95-118. · Zbl 1400.62039
[37] Ray, S. and Mallick, B. (2006). Functional clustering by Bayesian wavelet methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 305-332. · Zbl 1100.62058 · doi:10.1111/j.1467-9868.2006.00545.x
[38] Samé, A., Chamroukhi, F., Govaert, G. and Aknin, P. (2011). Model-based clustering and segmentation of time series with changes in regime. Adv. Data Anal. Classif. 5 301-321. · Zbl 1274.62427 · doi:10.1007/s11634-011-0096-5
[39] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[40] Vogel, P., Greiser, T. and Mattfeld, D. C. (2011). Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia-Social and Behavioral Sciences 20 514-523.
[41] Vogel, P. and Mattfeld, D. C. (2011). Strategic and operational planning of bike-sharing systems by data mining-A case study. In ICCL 127-141. Springer, Berlin.
[42] Xi, X., Keogh, E., Shelton, C., Wei, L. and Ratanamahatana, C. A. (2006). Fast time series classification using numerosity reduction. In 23 rd International Conference on Machine Learning ( ICML 2006) 1033-1040.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.