Effectively recognizing broken characters in Historical documents
Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken charact...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Published: |
2018
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/14031 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
id |
th-mahidol.14031 |
---|---|
record_format |
dspace |
spelling |
th-mahidol.140312018-06-11T11:45:12Z Effectively recognizing broken characters in Historical documents Chaivatna Sumetphong Supachai Tangwongsan Mahidol University Computer Science Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken characters using a heuristic enumerative method to find the optimal set partition of the broken pieces. Each subset of the optimal partition is mapped to the best character pattern and the overall image is recognized. Results obtained after performing experiments on a Thai Historical document and an American Historical document are quite promising. Given the generality of the method, it may be applicable to different language scripts given that a properly trained classifier has been developed for that script and font. © 2012 IEEE. 2018-06-11T04:45:12Z 2018-06-11T04:45:12Z 2012-10-09 Conference Paper CSAE 2012 - Proceedings, 2012 IEEE International Conference on Computer Science and Automation Engineering. Vol.3, (2012), 104-108 10.1109/CSAE.2012.6272918 2-s2.0-84867080115 https://repository.li.mahidol.ac.th/handle/123456789/14031 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84867080115&origin=inward |
institution |
Mahidol University |
building |
Mahidol University Library |
continent |
Asia |
country |
Thailand Thailand |
content_provider |
Mahidol University Library |
collection |
Mahidol University Institutional Repository |
topic |
Computer Science |
spellingShingle |
Computer Science Chaivatna Sumetphong Supachai Tangwongsan Effectively recognizing broken characters in Historical documents |
description |
Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken characters using a heuristic enumerative method to find the optimal set partition of the broken pieces. Each subset of the optimal partition is mapped to the best character pattern and the overall image is recognized. Results obtained after performing experiments on a Thai Historical document and an American Historical document are quite promising. Given the generality of the method, it may be applicable to different language scripts given that a properly trained classifier has been developed for that script and font. © 2012 IEEE. |
author2 |
Mahidol University |
author_facet |
Mahidol University Chaivatna Sumetphong Supachai Tangwongsan |
format |
Conference or Workshop Item |
author |
Chaivatna Sumetphong Supachai Tangwongsan |
author_sort |
Chaivatna Sumetphong |
title |
Effectively recognizing broken characters in Historical documents |
title_short |
Effectively recognizing broken characters in Historical documents |
title_full |
Effectively recognizing broken characters in Historical documents |
title_fullStr |
Effectively recognizing broken characters in Historical documents |
title_full_unstemmed |
Effectively recognizing broken characters in Historical documents |
title_sort |
effectively recognizing broken characters in historical documents |
publishDate |
2018 |
url |
https://repository.li.mahidol.ac.th/handle/123456789/14031 |
_version_ |
1763496689523490816 |