A combined template-based and case-based metadata extraction for heterogeneous thai documents

Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Krisda Khankasikam, Nopasit Chakpitak, Thana Udomsripaiboon
Format:	Conference Proceeding
Published:	2018
Subjects:	Computer Science Engineering
Online Access:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=64949203967&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/59515
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Chiang Mai University

id	th-cmuir.6653943832-59515
record_format	dspace
spelling	th-cmuir.6653943832-595152018-09-10T03:17:45Z A combined template-based and case-based metadata extraction for heterogeneous thai documents Krisda Khankasikam Nopasit Chakpitak Thana Udomsripaiboon Computer Science Engineering Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group. © 2008 IEEE. 2018-09-10T03:16:30Z 2018-09-10T03:16:30Z 2009-04-24 Conference Proceeding 2-s2.0-64949203967 10.1109/ICACC.2009.88 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=64949203967&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/59515
institution	Chiang Mai University
building	Chiang Mai University Library
country	Thailand
collection	CMU Intellectual Repository
topic	Computer Science Engineering
spellingShingle	Computer Science Engineering Krisda Khankasikam Nopasit Chakpitak Thana Udomsripaiboon A combined template-based and case-based metadata extraction for heterogeneous thai documents
description	Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group. © 2008 IEEE.
format	Conference Proceeding
author	Krisda Khankasikam Nopasit Chakpitak Thana Udomsripaiboon
author_facet	Krisda Khankasikam Nopasit Chakpitak Thana Udomsripaiboon
author_sort	Krisda Khankasikam
title	A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_short	A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_full	A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_fullStr	A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_full_unstemmed	A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_sort	combined template-based and case-based metadata extraction for heterogeneous thai documents
publishDate	2018
url	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=64949203967&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/59515
_version_	1681425265543086080

A combined template-based and case-based metadata extraction for heterogeneous thai documents

Similar Items