A combined template-based and case-based metadata extraction for heterogeneous thai documents

Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents t...

全面介紹

Saved in:
書目詳細資料
Main Authors: Krisda Khankasikam, Nopasit Chakpitak, Thana Udomsripaiboon
格式: Conference Proceeding
出版: 2018
主題:
在線閱讀:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=64949203967&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/59515
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Chiang Mai University
實物特徵
總結:Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group. © 2008 IEEE.