×

Functional distributional clustering using spatio-temporal data. (English) Zbl 07706325

Summary: This paper presents a new method called the functional distributional clustering algorithm (FDCA) that seeks to identify spatially contiguous clusters and incorporate changes in temporal patterns across overcrowded networks. This method is motivated by a graph-based network composed of sensors arranged over space where recorded observations for each sensor represent a multi-modal distribution. The proposed method is fully non-parametric and generates clusters within an agglomerative hierarchical clustering approach based on a measure of distance that defines a cumulative distribution function over temporal changes for different locations in space. Traditional hierarchical clustering algorithms that are spatially adapted do not typically accommodate the temporal characteristics of the underlying data. The effectiveness of the FDCA is illustrated using an application to both empirical and simulated data from about 400 sensors in a 2.5 square miles network area in downtown San Francisco, California. The results demonstrate the superior ability of the the FDCA in identifying true clusters compared to functional only and distributional only algorithms and similar performance to a model-based clustering algorithm.

MSC:

62-XX Statistics

Software:

R; ClustGeo; Silhouettes

References:

[1] Adin, A.; Lee, D.; Goicoa, T.; Ugarte, M. D., A two-stage approach to estimate spatial and spatio-temporal disease risks in the presence of local discontinuities and clusters, Stat. Methods. Med. Res., 28, 2595-2613 (2019)
[2] Birant, D.; Kut, A., ST-DBSCAN: An algorithm for clustering spatial-temporal data, Data Knowl. Eng., 60, 208-221 (2007)
[3] Bowman, A. W., An alternative method of cross-validation for the smoothing of density estimates, Biometrika, 71, 353-360 (1984)
[4] Brunsdon, C.; Corcoran, J.; Higgs, G., Visualising space and time in crime patterns: A comparison of methods, Comput. Environ. Urban Syst., 31, 52-75 (2007)
[5] Caliński, T.; Harabasz, J., A dendrite method for cluster analysis, Comm. Statist. Theory Methods, 3, 1-27 (1974) · Zbl 0273.62010
[6] Cao, R.; Li, B.; Wang, Z.; Peng, Z. R.; Tao, S.; Lou, S., Using a distributed air sensor network to investigate the spatiotemporal patterns of PM2.5 concentrations, Environ. Pollut., 264 (2020)
[7] Carvalho, A. X.Y.; Albuquerque, P. H.M.; de Almeida Junior, G. R.; Guimaraes, R. D., Spatial hierarchical clustering, Rev. Bras. Biom., 27, 411-442 (2009)
[8] Chavent, M.; Kuentz-Simonet, V.; Labenne, A.; Saracco, J., Clustgeo: An R package for hierarchical clustering with spatial constraints, Comput. Stat., 33, 1799-1822 (2018) · Zbl 1417.62006
[9] Cheam, A.; Marbac, M.; McNicholas, P., Model-based clustering for spatiotemporal data on air quality monitoring, Environmetrics, 28 (2017)
[10] Chiou, J. M.; Li, P. L., Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B (Statist. Methodol.), 69, 679-699 (2007) · Zbl 07555371
[11] Chiou, J. M.; Li, P. L., Correlation-based functional clustering via subspace projection, J. Am. Stat. Assoc., 103, 1684-1692 (2008) · Zbl 1286.62058
[12] Davies, D. L.; Bouldin, D. W., A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell., 2, 224-227 (1979)
[13] Dunn, J. C., A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybernet., 3, 32-57 (1973) · Zbl 0291.68033
[14] Fan, J.; Yim, T. H., A crossvalidation method for estimating conditional densities, Biometrika, 91, 819-834 (2004) · Zbl 1078.62032
[15] Fouedjio, F., A hierarchical clustering method for multivariate geostatistical data, Spat. Stat., 18, 333-351 (2016)
[16] Giraldo, R.; Delicado, P.; Mateu, J., Hierarchical clustering of spatially correlated functional data, Stat. Neerl., 66, 403-421 (2012)
[17] Hall, P.; Racine, J.; Li, Q., Cross-validation and the estimation of conditional probability densities, J. Am. Stat. Assoc., 99, 1015-1026 (2004) · Zbl 1055.62035
[18] Hart, J. D.; Vieu, P., Data-driven bandwidth choice for density estimation based on dependent data, Ann. Statist., 18, 873-890 (1990) · Zbl 0703.62045
[19] Harvey, A.; Oryshchenko, V., Kernel density estimation for time series data, Int. J. Forecast., 28, 3-14 (2012)
[20] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[21] Ignaccolo, R.; Ghigo, S.; Giovenali, E., Analysis of air quality monitoring networks by functional clustering, Environmetrics, 19, 672-686 (2008)
[22] James, G. M.; Sugar, C. A., Clustering for sparsely sampled functional data, J. Am. Stat. Assoc., 98, 397-408 (2003) · Zbl 1041.62052
[23] Jaya, I. G.N. M.; Folmer, H., Identifying spatiotemporal clusters by means of agglomerative hierarchical clustering and bayesian regression analysis with spatiotemporally varying coefficients: Methodology and application to dengue disease in Bandung, Indonesia, Geogr. Anal. (2021)
[24] Jung, Y.; Park, H.; Du, D. Z.; Drake, B. L., A decision criterion for the optimal number of clusters in hierarchical clustering, J. Global Optim., 25, 91-111 (2003)
[25] Leroux, B.G., Lei, X., and Breslow, N., Estimation of disease rates in small areas: A new mixed model for spatial dependence, M. Elizabeth Halloran, and Donald Berry, eds., Statistical Models in Epidemiology, the Environment, and Clinical Trials, Springer, 2000, pp. 179-191. https://link.springer.com/book/. · Zbl 0957.62095
[26] Li, H.; Liu, J.; Yang, Z.; Liu, R. W.; Wu, K.; Wan, Y., Adaptively constrained dynamic time warping for time series classification and clustering, Inf. Sci., 534, 97-116 (2020) · Zbl 1465.62151
[27] Li, Q.; Racine, J. S., Nonparametric estimation of conditional cdf and quantile functions with mixed categorical and continuous data, J. Bus. Econ. Stat., 26, 423-434 (2008)
[28] Liang, M.; Liu, R. W.; Li, S.; Xiao, Z.; Liu, X.; Lu, F., An unsupervised learning method with convolutional auto-encoder for vessel trajectory similarity computation, Ocean Eng., 225 (2021)
[29] MacQueen, J., et al., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Oakland, CA, USA, 1967, pp. 281-297. · Zbl 0214.46201
[30] Milligan, G. W.; Cooper, M. C., A study of the comparability of external criteria for hierarchical cluster analysis, Multivariate Behav. Res., 21, 441-458 (1986)
[31] Pei, T.; Zhou, C.; Zhu, A. X.; Li, B.; Qin, C., Windowed nearest neighbour method for mining spatio-temporal clusters in the presence of noise, Int. J. Geogr. Inf. Sci., 24, 925-948 (2010)
[32] Petrovic, S., A comparison between the Silhouette index and the Davies-Bouldin index in labelling IDS clusters, in Proceedings of the 11th Nordic Workshop of Secure IT Systems, 2006, pp. 53-64.
[33] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. Available at https://www.R-project.org.
[34] Rand, W. M., Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc., 66, 846-850 (1971)
[35] Rodriguez, M. Z.; Comin, C. H.; Casanova, D.; Bruno, O. M.; Amancio, D. R.; Costa, L.d. F.; Rodrigues, F. A., Clustering algorithms: A comparative approach, PloS One, 14 (2019)
[36] Rousseeuw, P. J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20, 53-65 (1987) · Zbl 0636.62059
[37] Rudemo, M., Empirical choice of histograms and kernel density estimators, Scand. J. Statist., 9, 65-78 (1982) · Zbl 0501.62028
[38] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc. Ser. B (Statist. Methodol.), 63, 411-423 (2001) · Zbl 0979.62046
[39] Ward Jr., J. H., Hierarchical grouping to optimize an objective function, J. Am. Stat. Assoc., 58, 236-244 (1963)
[40] Wei, Q.; She, J.; Zhang, S.; Ma, J., Using individual gps trajectories to explore foodscape exposure: A case study in beijing metropolitan area, Int. J. Environ. Res. Public. Health, 15, 405 (2018)
[41] Xu, D.; Tian, Y., A comprehensive survey of clustering algorithms, Ann. Data Sci., 2, 165-193 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.