UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM

At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system wh...

Full description

Saved in:
Bibliographic Details
Main Author: Ariq, Irfan
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39979
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39979
spelling id-itb.:399792019-06-28T14:56:11ZUTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM Ariq, Irfan Indonesia Final Project information extraction system, domain, class recognizer, domain relation mapper INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39979 At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain.
format Final Project
author Ariq, Irfan
spellingShingle Ariq, Irfan
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
author_facet Ariq, Irfan
author_sort Ariq, Irfan
title UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_short UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_full UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_fullStr UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_full_unstemmed UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_sort utilization of open domain information extraction system to domain-specific information extraction system
url https://digilib.itb.ac.id/gdl/view/39979
_version_ 1822925584927293440