×

Approximate pattern matching in LZ77-compressed texts. (English) Zbl 1328.68327

Summary: Suppose we want to support approximate pattern matching in a text \(T [1..n]\) whose LZ77 parse consists of \(z\) phrases. In this paper we show how, given that parse, we can preprocess \(T\) in \(\mathcal{O}(z \log n)\) time and space such that later, given a pattern \(P [1..m]\) and an edit distance \(k\), we can perform approximate pattern matching in \(\mathcal{O}(z \min (m k, m + k^4) + \mathrm{occ})\) time and \(\mathcal{O}(z \log n + m + \mathrm{occ})\) space, where \(\mathrm{occ}\) is the size of the output.

MSC:

68W32 Algorithms on strings
Full Text: DOI

References:

[1] Amir, A.; Benson, G., Efficient two-dimensional compressed matching, (Proceedings of the Data Compression Conference (DCC) (1992)), 279-288
[2] Bille, P.; Fagerberg, R.; Gørtz, I. L., Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts, ACM Trans. Algorithms, 6, 1 (2009) · Zbl 1300.68070
[3] Bille, P.; Landau, G. M.; Raman, R.; Sadakane, K.; Satti, S. R.; Weimann, O., Random access to grammar-compressed strings, (Proceedings of the 22nd Symposium on Discrete Algorithms (SODA) (2011)), 373-389 · Zbl 1375.68229
[4] Charikar, M.; Lehman, E.; Liu, D.; Panigrahy, R.; Prabhakaran, M.; Sahai, A.; Shelat, A., The smallest grammar problem, IEEE Trans. Inf. Theory, 51, 7, 2554-2576 (2005) · Zbl 1296.68086
[5] Claude, F.; Navarro, G., Improved grammar-based compressed indexes, (Proceedings of the 19th Symposium on String Processing and Information Retrieval (SPIRE) (2012)), 180-192
[6] Cole, R.; Hariharan, R., Approximate string matching: a simpler faster algorithm, SIAM J. Comput., 31, 6, 1761-1782 (2002) · Zbl 1008.68165
[7] Gagie, T.; Gawrychowski, P.; Kärkkäinen, J.; Nekrich, Y.; Puglisi, S. J., A faster grammar-compressed self-index, (Proceedings of the 6th Conference on Language and Automata Theory and Applications (LATA) (2012)), 240-251 · Zbl 1351.68089
[8] Gagie, T.; Gawrychowski, P.; Puglisi, S. J., Faster approximate pattern matching in compressed repetitive texts, (Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC) (2011)), 653-662 · Zbl 1350.68301
[9] Gąsieniec, L.; Kolpakov, R. M.; Potapov, I.; Sant, P., Real-time traversal in grammar-based compressed files
[10] Gąsieniec, L.; Kolpakov, R. M.; Potapov, I.; Sant, P., Real-time traversal in grammar-based compressed files, (Proceedings of the Data Compression Conference (DCC) (2005)), 458
[11] Gawrychowski, P., Pattern matching in Lempel-Ziv compressed strings: fast, simple and deterministic, (Proceedings of the 19th European Symposium on Algorithms (ESA) (2011)), 421-432 · Zbl 1347.68377
[12] Gawrychowski, P.; Straszak, D., Beating \(O(n m)\) in approximate LZW-compressed pattern matching, (Proceedings of the 24th International Symposium on Algorithms and Computation (ISAAC) (2013)), 78-88 · Zbl 1329.68313
[13] Kärkkäinen, J.; Navarro, G.; Ukkonen, E., Approximate string matching on Ziv-Lempel compressed text, J. Discrete Algorithms, 1, 3-4, 313-338 (2003) · Zbl 1100.68127
[14] Kuruppu, S.; Puglisi, S. J.; Zobel, J., Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval, (Proceedings of the 17th Symposium on String Processing and Information Retrieval (SPIRE) (2010)), 201-206 · Zbl 1397.68073
[15] Landau, G. M.; Vishkin, U., Fast parallel and serial approximate string matching, J. Algorithms, 10, 2, 157-169 (1989) · Zbl 0685.68033
[16] Larsson, N. J.; Moffat, A., Offline dictionary-based compression, (Proceedings of the Data Compression Conference (DCC) (1999)), 296-305
[17] Lohrey, M., Algorithmics on SLP-compressed strings: a survey, Groups Complex. Cryptol., 4, 2, 241-299 (2012) · Zbl 1285.68088
[18] Miyazaki, M.; Shinohara, A.; Takeda, M., An improved pattern matching algorithm for strings in terms of straight-line programs, (Proceedings of the 8th Symposium on Combinatorial Pattern Matching (CPM) (1997)), 1-11
[19] Navarro, G., A guided tour to approximate string matching, ACM Comput. Surv., 33, 1, 31-88 (2001)
[20] Rytter, W., Application of Lempel-Ziv factorization to the approximation of grammar-based compression, Theor. Comput. Sci., 302, 1-3, 211-222 (2003) · Zbl 1051.68088
[21] Tiskin, A., Semi-local string comparison: algorithmic techniques and applications (2013), Technical report
[22] Verbin, E.; Yu, W., Data structure lower bounds on random access to grammar-compressed strings, (Proceedings of the 24th Symposium on Combinatorial Pattern Matching (CPM) (2013)), 247-258 · Zbl 1381.68073
[23] Wandelt, S.; Leser, U., String searching in referentially compressed genomes, (Proceedings of the Conference on Knowledge Discovery and Information Retrieval (KDIR) (2012)), 95-102
[24] Welch, T., A technique for high-performance data compression, Computer, 17, 6, 8-19 (1984)
[25] Ziv, J.; Lempel, A., A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, 23, 3, 337-343 (1977) · Zbl 0379.94010
[26] Ziv, J.; Lempel, A., Compression of individual sequences via variable-rate coding, IEEE Trans. Inf. Theory, 24, 5, 530-536 (1978) · Zbl 0392.94004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.