UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM
At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system wh...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39979 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:39979 |
---|---|
spelling |
id-itb.:399792019-06-28T14:56:11ZUTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM Ariq, Irfan Indonesia Final Project information extraction system, domain, class recognizer, domain relation mapper INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39979 At this time, there is a lot of information on the web pages. Information on the web pages is also very diverse. That information will be very useful if it can be processed further. However, information on the web pages usually unstructured and hard to be processed further. Therefore, a system which can transform unstructured or semi-structured information into a structured one is developed. The system name is an information extraction system. There are two kind of information extraction system which are open information extraction (open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain IE usually more useful because it can extract information related to a domain and eliminate information unrelated to the domain. There are some method of domain IE to extract information from text. One of the method is adapting open IE to domain IE. In this work, we develop domain IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists before. In order to adapting open IE to domain IE, we added two components to the modular and extensible open IE. Those two components are class recognizer and domain relation mapper. With adding those two components, the new system still can maintain the modularity and extensibility from the previous open IE system. Class recognizer has responsibility to recognize domain-specific classes in the open IE extraction result. Class recognizer uses list of words and regular expression to recognize domain-specific classes. Domain relation mapper has responsibility to map open IE extraction result into domain-specific relation. This component uses mapping rules to do that. Mapping rules will automatically be generated using covering algorithm from domain data that has been prepared by the user. With those two components, the new system can extract information that related to a domain and ignores information that unrelated to the domain. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
At this time, there is a lot of information on the web pages. Information on the web pages is also
very diverse. That information will be very useful if it can be processed further. However,
information on the web pages usually unstructured and hard to be processed further. Therefore, a
system which can transform unstructured or semi-structured information into a structured one is
developed. The system name is an information extraction system.
There are two kind of information extraction system which are open information extraction
(open IE) and domain-specific information extraction (domain IE). Rather than open IE, domain
IE usually more useful because it can extract information related to a domain and eliminate
information unrelated to the domain. There are some method of domain IE to extract information
from text. One of the method is adapting open IE to domain IE. In this work, we develop domain
IE from open IE. A modular and extensible open IE that developed by Saputra (2018) exists
before. In order to adapting open IE to domain IE, we added two components to the modular and
extensible open IE. Those two components are class recognizer and domain relation mapper.
With adding those two components, the new system still can maintain the modularity and
extensibility from the previous open IE system. Class recognizer has responsibility to recognize
domain-specific classes in the open IE extraction result. Class recognizer uses list of words and
regular expression to recognize domain-specific classes. Domain relation mapper has
responsibility to map open IE extraction result into domain-specific relation. This component
uses mapping rules to do that. Mapping rules will automatically be generated using covering
algorithm from domain data that has been prepared by the user. With those two components, the
new system can extract information that related to a domain and ignores information that
unrelated to the domain. |
format |
Final Project |
author |
Ariq, Irfan |
spellingShingle |
Ariq, Irfan UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
author_facet |
Ariq, Irfan |
author_sort |
Ariq, Irfan |
title |
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
title_short |
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
title_full |
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
title_fullStr |
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
title_full_unstemmed |
UTILIZATION OF OPEN DOMAIN INFORMATION EXTRACTION SYSTEM TO DOMAIN-SPECIFIC INFORMATION EXTRACTION SYSTEM |
title_sort |
utilization of open domain information extraction system to domain-specific information extraction system |
url |
https://digilib.itb.ac.id/gdl/view/39979 |
_version_ |
1822925584927293440 |