×

(Quasi)periodicity quantification in video data, using topology. (English) Zbl 1401.65025

Summary: This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with noncommensurate periods), in a way which does not require segmentation, training, object tracking, or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology) by translating the problem of finding recurrent dynamics in video data into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels; we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings; and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished quantitatively end to end.

MSC:

65D18 Numerical aspects of computer graphics, image analysis, and computational geometry

Software:

Ripser

References:

[1] M. Allmen and C. R. Dyer, Cyclic motion detection using spatiotemporal surfaces and curves, in Proceedings of the 10th International Conference on Pattern Recognition, Vol. 1, IEEE, 1990, pp. 365–370. · Zbl 0825.68580
[2] J. Atanbori, P. Cowling, J. Murray, B. Colston, P. Eady, D. Hughes, I. Nixon, and P. Dickinson, Analysis of bat wing beat frequency using Fourier transform, in Computer Analysis of Images and Patterns. Part II, Springer, Heidelberg, 2013, pp. 370–377.
[3] U. Bauer, Ripser: A Lean C++ Code for the Computation of Vietoris-Rips Persistence Barcodes, (accessed 2015–2017).
[4] E. F. Briefer, A.-L. Maigrot, R. Mandel, S. B. Freymond, I. Bachmann, and E. Hillmann, Segregation of information about emotional arousal and valence in horse whinnies, Sci. Rep., 4 (2015), 9989.
[5] R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), pp. 5–30. · Zbl 1095.68094
[6] M. J. Crump, J. V. McDonnell, and T. M. Gureckis, Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research, PloS One, 8 (2013), e57410.
[7] R. Cutler and L. S. Davis, Robust real-time periodic motion detection, analysis, and applications, IEEE Trans. Pattern Anal. Mach. Intell., 22 (2000), pp. 781–796.
[8] A. De Cheveigné and H. Kawahara, Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am., 111 (2002), pp. 1917–1930.
[9] M. Delbracio and G. Sapiro, Removing camera shake via weighted Fourier burst accumulation, IEEE Trans. Image Process., 24 (2015), pp. 3293–3307. · Zbl 1408.94125
[10] D. D. Deliyski, P. P. Petrushev, H. S. Bonilha, T. T. Gerlach, B. Martin-Harris, and R. E. Hillman, Clinical implementation of laryngeal high-speed videoendoscopy: Challenges and evolution, Folia Phoniatr. Logop., 60 (2008), pp. 33–44.
[11] R. Goldenberg, R. Kimmel, E. Rivlin, and M. Rudzsky, Behavior classification by eigendecomposition of periodic motions, Pattern Recogn., 38 (2005), pp. 1033–1043.
[12] J. P. Gollub and H. L. Swinney, Onset of turbulence in a rotating fluid, Phys. Rev. Lett., 35 (1975), 927.
[13] A. Hatcher, Algebraic Topology, Cambridge University Press, Cambridge, UK, 2002. · Zbl 1044.55001
[14] C. T. Herbst, J. Unger, H. Herzel, J. G. Švec, and J. Lohscheller, Phasegram analysis of vocal fold vibration documented with laryngeal high-speed video endoscopy, J. Voice, 30 (2016), pp. 771.e1–771.e15.
[15] H. Herzel, D. Berry, I. R. Titze, and M. Saleh, Analysis of vocal disorders with methods from nonlinear dynamics, J. Speech Lang. Hear. Res., 37 (1994), pp. 1008–1019.
[16] H. Herzel and R. Reuter, Biphonation in voice signals, in Chaotic, Fractal, and Nonlinear Signal Processing, AIP Conf. Proc. 375, AIP Publishing, Melville, NY, 1996, pp. 644–657.
[17] P. Huang, A. Hilton, and J. Starck, Shape similarity for 3D video sequences of people, Int. J. Comput. Vis., 89 (2010), pp. 362–381.
[18] S. Huang, X. Ying, J. Rong, Z. Shang, and H. Zha, Camera calibration from periodic motion of a pedestrian, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3025–3033.
[19] X. Jiang, L.-H. Lim, Y. Yao, and Y. Ye, Statistical ranking and combinatorial Hodge theory, Math. Program., 127 (2011), pp. 203–244. · Zbl 1210.90142
[20] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge University Press, Cambridge, UK, 2004. · Zbl 1050.62093
[21] M. G. Kendall, A new measure of rank correlation, Biometrika, 30 (1938), pp. 81–93. · Zbl 0019.13001
[22] M. B. Kennel, R. Brown, and H. D. Abarbanel, Determining embedding dimension for phase-space reconstruction using a geometrical construction, Phys. Rev. A, 45 (1992), 3403.
[23] O. Kumdee and P. Ritthipravat, Repetitive motion detection for human behavior understanding from video images, in Proceedings of the 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2015, pp. 484–489.
[24] O. Levy and L. Wolf, Live repetition counting, in Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015, pp. 3020–3028.
[25] J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos, Med. Image Anal., 11 (2007), pp. 400–413.
[26] P. McLeod and G. Wyvill, A smarter way to find pitch, in Proceedings of the International Computer Music Conference (ICMC05), 2005, pp. 138–141.
[27] D. D. Mehta, D. D. Deliyski, T. F. Quatieri, and R. E. Hillman, Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings, J. Speech Lang. Hear. Res., 54 (2011), pp. 47–54.
[28] G. A. Miller, The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychol. Rev., 63 (1956), pp. 81–97.
[29] J. Neubauer, P. Mergell, U. Eysholdt, and H. Herzel, Spatio-temporal analysis of irregular vocal fold oscillations: Biphonation due to desynchronization of spatial modes, J. Acoust. Soc. Am., 110 (2001), pp. 3179–3192.
[30] S. A. Niyogi and E. H. Adelson, Analyzing and recognizing walking figures in XYT, in Proceedings of the 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’94), 1994, pp. 469–474.
[31] J. A. Perea, Persistent homology of toroidal sliding window embeddings, in Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 6435–6439.
[32] J. A. Perea and J. Harer, Sliding windows and persistence: An application of topological methods to signal analysis, Found. Comput. Math., 15 (2015), pp. 799–838. · Zbl 1325.37054
[33] M. A. Pinsky, Introduction to Fourier Analysis and Wavelets, Grad. Stud. Math. 102, American Mathematical Society, Providence, RI, 2009. · Zbl 1168.42001
[34] A. M. Plotnik and S. M. Rock, Quantification of cyclic motion of marine animals from computer vision, in OCEANS ’02 MTS/IEEE, Vol. 3, IEEE, 2002, pp. 1575–1581.
[35] R. Polana and R. C. Nelson, Detection and recognition of periodic, nonrigid motion, Int. J. Comput. Vis., 23 (1997), pp. 261–282.
[36] Q. Qiu, H. Schutte, L. Gu, and Q. Yu, An automatic method to quantify the vibration properties of human vocal folds via videokymography, Folia Phoniatr. Logop., 55 (2003), pp. 128–136.
[37] M. Robinson, A topological low-pass filter for quasi-periodic signals, IEEE Signal Process. Lett, 23 (2016), pp. 1771–1775.
[38] C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: A local svm approach, in Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Vol. 3, 2004, pp. 32–36.
[39] S. M. Seitz and C. R. Dyer, View-invariant analysis of cyclic motion, Int. J. Comput. Vis., 25 (1997), pp. 231–251.
[40] F. Takens, Detecting strange attractors in turbulence, in Dynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Math. 898, Springer, Berlin, New York, 1981, pp. 366–381. · Zbl 0513.58032
[41] C. J. Tralie, High dimensional geometry of sliding window embeddings of periodic videos, in Proceedings of the 32nd International Symposium on Computational Geometry, LIPIcs. Leibniz Int. Proc. Inform. 51, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Germany, 2016, 71. · Zbl 1387.68277
[42] C. J. Tralie, Geometric Multimedia Time Series, Ph.D. thesis, Department of Electrical and Computer Engineering, Duke University, Durham, NC, 2017.
[43] M. Turk and A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci., 3 (1991), pp. 71–86.
[44] M. Vejdemo-Johansson, F. T. Pokorny, P. Skraba, and D. Kragic, Cohomological learning of periodic motion, Appl. Algebra Engrg. Comm. Comput., 26 (2015), pp. 5–26. · Zbl 1331.68236
[45] V. Venkataraman and P. Turaga, Shape descriptions of nonlinear dynamical systems for video-based inference, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016), pp. 2531–2543.
[46] P. Wang, G. D. Abowd, and J. M. Rehg, Quasi-periodic event analysis for social game retrieval, in Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 112–119.
[47] I. Wilden, H. Herzel, G. Peters, and G. Tembrock, Subharmonics, biphonation, and deterministic chaos in mammal vocalization, Bioacoustics, 9 (1998), pp. 171–196.
[48] T. Wittenberg, M. Moser, M. Tigges, and U. Eysholdt, Recording, processing, and analysis of digital high-speed sequences in glottography, Mach. Vis. Appl., 8 (1995), pp. 399–404.
[49] O. Yair, R. Talmon, R. R. Coifman, and I. G. Kevrekidis, No Equations, No Parameters, No Variables: Data, and the Reconstruction of Normal Forms by Learning Informed Observation Geometries, preprint, , 2016.
[50] J. Yang, H. Zhang, and G. Peng, Time-domain period detection in short-duration videos, Signal Image Video Process., 10 (2016), pp. 695–702.
[51] G. Yu, G. Sapiro, and S. Mallat, Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity, IEEE Trans. Image Process., 21 (2012), pp. 2481–2499. · Zbl 1373.94471
[52] S. R. Zacharias, C. M. Myer, J. Meinzen-Derr, L. Kelchner, D. D. Deliyski, and A. de Alarcón, Comparison of videostroboscopy and high-speed videoendoscopy in evaluation of supraglottic phonation, Ann. Otol. Rhinol. Laryngol., 125 (2016), pp. 829–837.
[53] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), pp. 249–274. · Zbl 1069.55003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.