×

An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images. (English) Zbl 1185.68594

Summary: The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved Pattern Matching (PM) technique.
In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi-symbol QM-coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness.
Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6-3 times in the lossy case and 1.3-2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images.
Only the binary printed typeset textual images are considered.
The proposed method has a high-compression ratio for archiving and storage applications.
To the authors’ best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run-length and arithmetic coding techniques has not so far been used.

MSC:

68T10 Pattern recognition, speech recognition
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68U10 Computing methodologies for image processing
Full Text: DOI

References:

[1] DOI: 10.1109/T-C.1974.223826 · Zbl 0288.68048 · doi:10.1109/T-C.1974.223826
[2] DOI: 10.1117/1.482609 · doi:10.1117/1.482609
[3] DOI: 10.1147/rd.266.0681 · doi:10.1147/rd.266.0681
[4] Cirrincione, G. and Cirrincione, M. (2007), ”Neural networks for matching in computer vision”,KES 2007, pp. 688-95. · Zbl 1008.68141 · doi:10.1007/978-3-540-74819-9_85
[5] Eggert, J., Zhang, C. and Corner, E. (2007), ”Template matching for large transformations”,ICANN 2007, pp. 169-79. · doi:10.1007/978-3-540-74695-9_18
[6] Holt, M.J.J. and Xydeas, C.S. (1986), ”Recent developments in image data compression for digital facsimile”,ICL Tech. J., pp. 123-46, May.
[7] DOI: 10.1093/comjnl/40.2_and_3.146 · doi:10.1093/comjnl/40.2_and_3.146
[8] DOI: 10.1109/76.735380 · doi:10.1109/76.735380
[9] DOI: 10.1002/j.1538-7305.1983.tb03192.x · doi:10.1002/j.1538-7305.1983.tb03192.x
[10] Kanungo, T., Haralick, R.M. and Phillips, I.T. (1993), ”Global and local document degradation models”,Proc. ICDAR, pp. 730-4. · doi:10.1109/ICDAR.1993.395633
[11] DOI: 10.1109/83.846239 · doi:10.1109/83.846239
[12] DOI: 10.1006/cviu.1998.0682 · doi:10.1006/cviu.1998.0682
[13] Kok, C.W. and Nguyen, T.Q. (1996), ”Document image compression by sub-band system”,IEEE ISCAS ’96, Atlanta, GA, USA, Vol. 2, pp. 688-91.
[14] Marimon, D. and Ebrahimi, T. (2007), ”Efficient rotation-discriminative template matching”,CIARP 2007, Chile, pp. 221-30.
[15] DOI: 10.1108/17563780810919122 · Zbl 1178.68458 · doi:10.1108/17563780810919122
[16] DOI: 10.1109/PROC.1980.11744 · doi:10.1109/PROC.1980.11744
[17] Song, J., Chen, B., Chi, Z., Qiu, X. and Wang, W. (2007), ”Face recognition based on binary template matching”,ICIC 2007, Qingdao, pp. 1131-9. · doi:10.1007/978-3-540-74171-8_115
[18] Stefano, L.D., Mattoccia, S. and Tombari, F. (2004), ”An algorithm for efficient and exhaustive template matching”,ICIAR 2004, Porto, pp. 408-15. · doi:10.1007/978-3-540-30125-7_51
[19] DOI: 10.1007/978-3-540-76386-4_60 · doi:10.1007/978-3-540-76386-4_60
[20] DOI: 10.1109/5.286192 · doi:10.1109/5.286192
[21] DOI: 10.1016/S0031-3203(99)00112-0 · doi:10.1016/S0031-3203(99)00112-0
[22] DOI: 10.1109/83.923278 · Zbl 1036.68509 · doi:10.1109/83.923278
[23] DOI: 10.1109/TIP.2003.815253 · doi:10.1109/TIP.2003.815253
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.