An optimal approach towards recognizing broken Thai characters in OCR systems

This paper presents a novel technique for recognizing broken Thai characters found in degraded Thai text documents by modeling it as a set-partitioning problem (SPP). The technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed Thai cha...

Full description

Saved in:
Bibliographic Details
Main Authors: Chaivatna Sumetphong, Supachai Tangwongsan
Other Authors: Mahidol University
Format: Conference or Workshop Item
Published: 2018
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/14005
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.14005
record_format dspace
spelling th-mahidol.140052018-06-11T11:44:43Z An optimal approach towards recognizing broken Thai characters in OCR systems Chaivatna Sumetphong Supachai Tangwongsan Mahidol University Computer Science This paper presents a novel technique for recognizing broken Thai characters found in degraded Thai text documents by modeling it as a set-partitioning problem (SPP). The technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed Thai character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm we call Heuristic Incremental Integer Programming (HIIP), that employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. To generate corrected Thai words, we adopt a probabilistic generative approach based a Thai dictionary corpus. The proposed technique is applied successfully to a Thai historical document and poor quality Thai fax document with promising accuracy rates over 93%. © 2012 IEEE. 2018-06-11T04:44:43Z 2018-06-11T04:44:43Z 2012-12-01 Conference Paper 2012 International Conference on Digital Image Computing Techniques and Applications, DICTA 2012. (2012) 10.1109/DICTA.2012.6411736 2-s2.0-84874352445 https://repository.li.mahidol.ac.th/handle/123456789/14005 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84874352445&origin=inward
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Computer Science
spellingShingle Computer Science
Chaivatna Sumetphong
Supachai Tangwongsan
An optimal approach towards recognizing broken Thai characters in OCR systems
description This paper presents a novel technique for recognizing broken Thai characters found in degraded Thai text documents by modeling it as a set-partitioning problem (SPP). The technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed Thai character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm we call Heuristic Incremental Integer Programming (HIIP), that employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. To generate corrected Thai words, we adopt a probabilistic generative approach based a Thai dictionary corpus. The proposed technique is applied successfully to a Thai historical document and poor quality Thai fax document with promising accuracy rates over 93%. © 2012 IEEE.
author2 Mahidol University
author_facet Mahidol University
Chaivatna Sumetphong
Supachai Tangwongsan
format Conference or Workshop Item
author Chaivatna Sumetphong
Supachai Tangwongsan
author_sort Chaivatna Sumetphong
title An optimal approach towards recognizing broken Thai characters in OCR systems
title_short An optimal approach towards recognizing broken Thai characters in OCR systems
title_full An optimal approach towards recognizing broken Thai characters in OCR systems
title_fullStr An optimal approach towards recognizing broken Thai characters in OCR systems
title_full_unstemmed An optimal approach towards recognizing broken Thai characters in OCR systems
title_sort optimal approach towards recognizing broken thai characters in ocr systems
publishDate 2018
url https://repository.li.mahidol.ac.th/handle/123456789/14005
_version_ 1763497664912031744