Effectively recognizing broken characters in Historical documents

Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken charact...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Chaivatna Sumetphong, Supachai Tangwongsan
مؤلفون آخرون: Mahidol University
التنسيق: Conference or Workshop Item
منشور في: 2018
الموضوعات:
الوصول للمادة أونلاين:https://repository.li.mahidol.ac.th/handle/123456789/14031
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken characters using a heuristic enumerative method to find the optimal set partition of the broken pieces. Each subset of the optimal partition is mapped to the best character pattern and the overall image is recognized. Results obtained after performing experiments on a Thai Historical document and an American Historical document are quite promising. Given the generality of the method, it may be applicable to different language scripts given that a properly trained classifier has been developed for that script and font. © 2012 IEEE.