UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM

At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system wh...

Full description

Saved in:

Bibliographic Details
Main Author:	Ariq, Irfan
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/39979
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:39979
spelling	id-itb.:399792019-06-28T14:56:11ZUTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM Ariq, Irfan Indonesia Final Project information extraction system, domain, class recognizer, domain relation mapper INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39979 At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain.
format	Final Project
author	Ariq, Irfan
spellingShingle	Ariq, Irfan UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
author_facet	Ariq, Irfan
author_sort	Ariq, Irfan
title	UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_short	UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_full	UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_fullStr	UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_full_unstemmed	UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
title_sort	utilization of open domain information extraction system to domain-specific information extraction system
url	https://digilib.itb.ac.id/gdl/view/39979
_version_	1822925584927293440

UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM

Similar Items