Line segmentation for grayscale text images of Khmer palm leaf manuscripts

Dona Valy, Michel Verleysen, Kimheng Sok

Research output: Contribution in Book/Catalog/Report/Conference proceedingConference contribution

Abstract

Text line segmentation is one of the most essential pre-processing steps in character recognition and document analysis. In ancient documents, a variety of deformations caused by aging produce noises which make the binarization process very challenging. Moreover, due to the irregular layout such as skewness and fluctuation of text lines, segmenting an ancient manuscript page into lines still remains an open problem to solve. In this paper, we propose a novel line segmentation scheme for grayscale images of Khmer ancient documents. First, a stroke width transform is applied to extract connected components from the document page. The number and medial positions of text lines are estimated using a modified piece-wise projection profile technique. Those positions are then modified adaptively according to the curvature of the actual text lines. Finally, a path finding approach is used to separate touching components and also to mark the boundary of the text lines. Experiments are conducted on a dataset of 110 pages of Khmer palm leaf manuscript images by comparing the robustness of the proposed approach with existing methods from the literature.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Image Processing Theory, Tools and Applications, IPTA 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
Volume2018-January
ISBN (Electronic)9781538618417
DOIs
Publication statusPublished - 8 Mar 2018
Externally publishedYes
Event7th International Conference on Image Processing Theory, Tools and Applications, IPTA 2017 - Montreal, Canada
Duration: 28 Nov 20171 Dec 2017

Publication series

Name2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)

Conference

Conference7th International Conference on Image Processing Theory, Tools and Applications, IPTA 2017
Country/TerritoryCanada
CityMontreal
Period28/11/171/12/17

Keywords

  • handwritten document analysis
  • palm leaf manuscript
  • text line segmentation

Fingerprint

Dive into the research topics of 'Line segmentation for grayscale text images of Khmer palm leaf manuscripts'. Together they form a unique fingerprint.

Cite this