A highly effective approach for document page layout extraction system

In this paper, we propose a highly effective scheme for document page layout extraction system as a part of character recognition processes. There are 3 stages in the working model, namely document segmentation, document layout classification and document reading order determination. In the first st...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Supachai Tangwongsan, Cholticha Boondireke
مؤلفون آخرون:	Mahidol University
التنسيق:	Conference or Workshop Item
منشور في:	2018
الموضوعات:	Computer Science
الوصول للمادة أونلاين:	https://repository.li.mahidol.ac.th/handle/123456789/31586
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Mahidol University

id	th-mahidol.31586
record_format	dspace
spelling	th-mahidol.315862018-10-19T11:50:25Z A highly effective approach for document page layout extraction system Supachai Tangwongsan Cholticha Boondireke Mahidol University Computer Science In this paper, we propose a highly effective scheme for document page layout extraction system as a part of character recognition processes. There are 3 stages in the working model, namely document segmentation, document layout classification and document reading order determination. In the first stage, a hybrid document segmentation decomposes a page of the document image into a variety of blocks by using the combination of diagonal white runs and vertical edges segmentation, together with modified histogram projection. Next, the features related to geometric layout in the page are extracted by using the feature analysis, combined with the technique of rule-based approach for classifying those block types and attributes. In the third stage, a highly efficient algorithm is introduced for block order sequencing search (BOSS) as to determine the right reading sequences of blocks in the page. The model is then tested on a large number of samples of those bilingual documents with Thai and English languages, and with different geometric patterns, multiple columns, rows, fonts and sizes. The results show quite a promising one with accuracy rate of 99.47%, and the speed of 2.887 seconds per page on the average in the experiment. © 2013 IEEE. 2018-10-19T04:50:25Z 2018-10-19T04:50:25Z 2013-12-01 Conference Paper 2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2013. (2013), 85-90 10.1109/ICCWAMTIP.2013.6716605 2-s2.0-84894207989 https://repository.li.mahidol.ac.th/handle/123456789/31586 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84894207989&origin=inward
institution	Mahidol University
building	Mahidol University Library
continent	Asia
country	Thailand Thailand
content_provider	Mahidol University Library
collection	Mahidol University Institutional Repository
topic	Computer Science
spellingShingle	Computer Science Supachai Tangwongsan Cholticha Boondireke A highly effective approach for document page layout extraction system
description	In this paper, we propose a highly effective scheme for document page layout extraction system as a part of character recognition processes. There are 3 stages in the working model, namely document segmentation, document layout classification and document reading order determination. In the first stage, a hybrid document segmentation decomposes a page of the document image into a variety of blocks by using the combination of diagonal white runs and vertical edges segmentation, together with modified histogram projection. Next, the features related to geometric layout in the page are extracted by using the feature analysis, combined with the technique of rule-based approach for classifying those block types and attributes. In the third stage, a highly efficient algorithm is introduced for block order sequencing search (BOSS) as to determine the right reading sequences of blocks in the page. The model is then tested on a large number of samples of those bilingual documents with Thai and English languages, and with different geometric patterns, multiple columns, rows, fonts and sizes. The results show quite a promising one with accuracy rate of 99.47%, and the speed of 2.887 seconds per page on the average in the experiment. © 2013 IEEE.
author2	Mahidol University
author_facet	Mahidol University Supachai Tangwongsan Cholticha Boondireke
format	Conference or Workshop Item
author	Supachai Tangwongsan Cholticha Boondireke
author_sort	Supachai Tangwongsan
title	A highly effective approach for document page layout extraction system
title_short	A highly effective approach for document page layout extraction system
title_full	A highly effective approach for document page layout extraction system
title_fullStr	A highly effective approach for document page layout extraction system
title_full_unstemmed	A highly effective approach for document page layout extraction system
title_sort	highly effective approach for document page layout extraction system
publishDate	2018
url	https://repository.li.mahidol.ac.th/handle/123456789/31586
_version_	1763488918371565568

A highly effective approach for document page layout extraction system

مواد مشابهة