UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM

At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system wh...

Full description

Saved in:
Bibliographic Details
Main Author: Ariq, Irfan
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39979
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain.