Character segmentation from epigraphical images helps the optical character recognizer (OCR) in training and recognition of old regional scripts. The scripts or characters present in the images are illegible and may have complex and noisy background texture. In this paper, we present an automated way of segmenting and extracting characters on digitized inscriptions. To achieve this, machine learning models are employed to discern between correctly segmented characters and partially segmented ones. The proposed method first recursively crops the document by sliding a window across the image from top to bottom to extract the content within the window. This results in a number of small images for classification. The segments are classified into character and non-character class based on the features within them. The model was tested on a wide range of input images having irregular, inconsistently spaced, hand written and inscribed characters.
Keywords:
Published on: Aug 4, 2021 Pages: 45-52
Full Text PDF
Full Text HTML
DOI: 10.17352/tcsit.000039
CrossMark
Publons
Harvard Library HOLLIS
Search IT
Semantic Scholar
Get Citation
Base Search
Scilit
OAI-PMH
ResearchGate
Academic Microsoft
GrowKudos
Universite de Paris
UW Libraries
SJSU King Library
SJSU King Library
NUS Library
McGill
DET KGL BIBLiOTEK
JCU Discovery
Universidad De Lima
WorldCat
VU on WorldCat
PTZ: We're glad you're here. Please click "create a new query" if you are a new visitor to our website and need further information from us.
If you are already a member of our network and need to keep track of any developments regarding a question you have already submitted, click "take me to my Query."