TY - GEN
T1 - Line segmentation for grayscale text images of Khmer palm leaf manuscripts
AU - Valy, Dona
AU - Verleysen, Michel
AU - Sok, Kimheng
PY - 2018/3/8
Y1 - 2018/3/8
N2 - Text line segmentation is one of the most essential pre-processing steps in character recognition and document analysis. In ancient documents, a variety of deformations caused by aging produce noises which make the binarization process very challenging. Moreover, due to the irregular layout such as skewness and fluctuation of text lines, segmenting an ancient manuscript page into lines still remains an open problem to solve. In this paper, we propose a novel line segmentation scheme for grayscale images of Khmer ancient documents. First, a stroke width transform is applied to extract connected components from the document page. The number and medial positions of text lines are estimated using a modified piece-wise projection profile technique. Those positions are then modified adaptively according to the curvature of the actual text lines. Finally, a path finding approach is used to separate touching components and also to mark the boundary of the text lines. Experiments are conducted on a dataset of 110 pages of Khmer palm leaf manuscript images by comparing the robustness of the proposed approach with existing methods from the literature.
AB - Text line segmentation is one of the most essential pre-processing steps in character recognition and document analysis. In ancient documents, a variety of deformations caused by aging produce noises which make the binarization process very challenging. Moreover, due to the irregular layout such as skewness and fluctuation of text lines, segmenting an ancient manuscript page into lines still remains an open problem to solve. In this paper, we propose a novel line segmentation scheme for grayscale images of Khmer ancient documents. First, a stroke width transform is applied to extract connected components from the document page. The number and medial positions of text lines are estimated using a modified piece-wise projection profile technique. Those positions are then modified adaptively according to the curvature of the actual text lines. Finally, a path finding approach is used to separate touching components and also to mark the boundary of the text lines. Experiments are conducted on a dataset of 110 pages of Khmer palm leaf manuscript images by comparing the robustness of the proposed approach with existing methods from the literature.
KW - handwritten document analysis
KW - palm leaf manuscript
KW - text line segmentation
UR - http://www.scopus.com/inward/record.url?scp=85050655936&partnerID=8YFLogxK
U2 - 10.1109/ipta.2017.8310097
DO - 10.1109/ipta.2017.8310097
M3 - Conference contribution
AN - SCOPUS:85050655936
VL - 2018-January
T3 - 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)
SP - 1
EP - 6
BT - Proceedings of the 7th International Conference on Image Processing Theory, Tools and Applications, IPTA 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Image Processing Theory, Tools and Applications, IPTA 2017
Y2 - 28 November 2017 through 1 December 2017
ER -