A combined template-based and case-based metadata extraction for heterogeneous thai documents

Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents t...

Full description

Saved in:
Bibliographic Details
Main Authors: Khankasikam K., Chakpitak N., Udomsripaiboon T.
Format: Conference or Workshop Item
Language:English
Published: 2014
Online Access:http://www.scopus.com/inward/record.url?eid=2-s2.0-64949203967&partnerID=40&md5=9519e8c57260709d7bc921591adfd194
http://cmuir.cmu.ac.th/handle/6653943832/954
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
id th-cmuir.6653943832-954
record_format dspace
spelling th-cmuir.6653943832-9542014-08-29T09:09:58Z A combined template-based and case-based metadata extraction for heterogeneous thai documents Khankasikam K. Chakpitak N. Udomsripaiboon T. Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group. © 2008 IEEE. 2014-08-29T09:09:58Z 2014-08-29T09:09:58Z 2009 Conference Paper 9780769535166 10.1109/ICACC.2009.88 75859 http://www.scopus.com/inward/record.url?eid=2-s2.0-64949203967&partnerID=40&md5=9519e8c57260709d7bc921591adfd194 http://cmuir.cmu.ac.th/handle/6653943832/954 English
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
language English
description Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group. © 2008 IEEE.
format Conference or Workshop Item
author Khankasikam K.
Chakpitak N.
Udomsripaiboon T.
spellingShingle Khankasikam K.
Chakpitak N.
Udomsripaiboon T.
A combined template-based and case-based metadata extraction for heterogeneous thai documents
author_facet Khankasikam K.
Chakpitak N.
Udomsripaiboon T.
author_sort Khankasikam K.
title A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_short A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_full A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_fullStr A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_full_unstemmed A combined template-based and case-based metadata extraction for heterogeneous thai documents
title_sort combined template-based and case-based metadata extraction for heterogeneous thai documents
publishDate 2014
url http://www.scopus.com/inward/record.url?eid=2-s2.0-64949203967&partnerID=40&md5=9519e8c57260709d7bc921591adfd194
http://cmuir.cmu.ac.th/handle/6653943832/954
_version_ 1681419603361660928