EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES

Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant wor...

Full description

Saved in:
Bibliographic Details
Main Author: Romadhony, Ade
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/36671
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:36671
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant word co-occurrence information that describe particular event types. The approach performs well when it is applied on large document collection, and the results has been tested on related task, such as event argument extraction. However, there are several conditions where large size document requirement can not be fulfilled, hence information from other sources such as knowledge bases become useful. By employing knowledge bases that contain word semantic relatedness information, we can obtain additional knowledge to gather event-related words. In this work, to generate the event schema, we use the Open Information Extraction (Open IE) extraction results, usually called relation tuples. Open IE is an information extraction paradigm that applies minimal restriction on extracted information types. An Open IE relation tuple has a structure that consists of relation/trigger and arguments, and is similar to event structure model that commonly employed by several previous works on automatically generating event schemas. Open IE relation tuple as an intermediate structure has also been tested on several tasks, and has better performance than other structures, especially on semantic relatedness task. The contribution of this research lies in the development of the method of clustering the relation tuple with the use of external knowledge bases and the development of methods to improve the quality of the Open IE extraction results in the preprocessing stage of the input and the addition of extraction rules. The grouping of relation tuples based on semantic linkages will produce a scheme that can be used as a template for information extraction. The method proposed for grouping relation tuples by emphasizing semantic similarities that do not depend on the information obtained from redundancy in the document, because the method cannot always collect tuple relations with high semantic similarities in the same group, especially if the IE Open extraction results are lacking complete and contains noise. The definition of grouping method is based on several variations in similarity calculations, among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger corpus statistics. The use of an external knowledge base is also carried out in the constrained clustering process, and the filtering argument with certain event semantics. Before the process of grouping tuple relations is done, it is necessary to evaluate the quality of the IE Open extraction results. Based on the examination on the extraction results of the existing Open IE system, there is an opportunity to improve the quality of extraction results in terms of accuracy and completeness. To improve the accuracy and completeness of the Open IE extraction results, pre-processing was carried out on the Open IE system input and modification of extraction rules. The pre-processing of the IE Open input sentence is done by simplifying the sentence using rule-based methods with punctuation features, POSTag, and phrase types. The method has a low level of complexity when compared to the use of more complicated features such as the type of dependency, but has equivalent performance. While in the addition of relation extraction rules, new rules are obtained from the learning process by using the decision tree method. The feature proposed in the addition of extraction rules is the second level type dependency feature. With the extraction rules with dependency type features that are not limited to direct connection, it is proven to increase the number of relevant relations that can be extracted. Schema evaluation resulting from grouping of relation tuples is done by testing the task of identifying and extracting event arguments. The test results show that the scheme built can be used to extract event arguments on the standard Open English Extraction (ASTRE) dataset, and its performance, which is indicated by the values of precision, recall, and F1 increases, with an increase in F1 reaching 46% from 0,13 where no konowledge bases were involved. We also compared the system performance to the state-of-the-art system performances, and the argument extraction result shows that our proposed system has 4,7% higher precision than the other systems (previous best precision was 0,21). However, there are arguments that could not be extracted, hence the recall and F1 of our proposed system is lower than the state-of-the-art system performance.
format Dissertations
author Romadhony, Ade
spellingShingle Romadhony, Ade
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
author_facet Romadhony, Ade
author_sort Romadhony, Ade
title EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_short EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_full EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_fullStr EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_full_unstemmed EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_sort event schema generation based on open ie relation tuples using knowledge bases
url https://digilib.itb.ac.id/gdl/view/36671
_version_ 1822268740347101184
spelling id-itb.:366712019-03-14T10:52:47ZEVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES Romadhony, Ade Indonesia Dissertations event schema; Open IE; relation tuple; knowledge base; clustering; sentence simplification; Open IE extraction rules learning; INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/36671 Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant word co-occurrence information that describe particular event types. The approach performs well when it is applied on large document collection, and the results has been tested on related task, such as event argument extraction. However, there are several conditions where large size document requirement can not be fulfilled, hence information from other sources such as knowledge bases become useful. By employing knowledge bases that contain word semantic relatedness information, we can obtain additional knowledge to gather event-related words. In this work, to generate the event schema, we use the Open Information Extraction (Open IE) extraction results, usually called relation tuples. Open IE is an information extraction paradigm that applies minimal restriction on extracted information types. An Open IE relation tuple has a structure that consists of relation/trigger and arguments, and is similar to event structure model that commonly employed by several previous works on automatically generating event schemas. Open IE relation tuple as an intermediate structure has also been tested on several tasks, and has better performance than other structures, especially on semantic relatedness task. The contribution of this research lies in the development of the method of clustering the relation tuple with the use of external knowledge bases and the development of methods to improve the quality of the Open IE extraction results in the preprocessing stage of the input and the addition of extraction rules. The grouping of relation tuples based on semantic linkages will produce a scheme that can be used as a template for information extraction. The method proposed for grouping relation tuples by emphasizing semantic similarities that do not depend on the information obtained from redundancy in the document, because the method cannot always collect tuple relations with high semantic similarities in the same group, especially if the IE Open extraction results are lacking complete and contains noise. The definition of grouping method is based on several variations in similarity calculations, among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger corpus statistics. The use of an external knowledge base is also carried out in the constrained clustering process, and the filtering argument with certain event semantics. Before the process of grouping tuple relations is done, it is necessary to evaluate the quality of the IE Open extraction results. Based on the examination on the extraction results of the existing Open IE system, there is an opportunity to improve the quality of extraction results in terms of accuracy and completeness. To improve the accuracy and completeness of the Open IE extraction results, pre-processing was carried out on the Open IE system input and modification of extraction rules. The pre-processing of the IE Open input sentence is done by simplifying the sentence using rule-based methods with punctuation features, POSTag, and phrase types. The method has a low level of complexity when compared to the use of more complicated features such as the type of dependency, but has equivalent performance. While in the addition of relation extraction rules, new rules are obtained from the learning process by using the decision tree method. The feature proposed in the addition of extraction rules is the second level type dependency feature. With the extraction rules with dependency type features that are not limited to direct connection, it is proven to increase the number of relevant relations that can be extracted. Schema evaluation resulting from grouping of relation tuples is done by testing the task of identifying and extracting event arguments. The test results show that the scheme built can be used to extract event arguments on the standard Open English Extraction (ASTRE) dataset, and its performance, which is indicated by the values of precision, recall, and F1 increases, with an increase in F1 reaching 46% from 0,13 where no konowledge bases were involved. We also compared the system performance to the state-of-the-art system performances, and the argument extraction result shows that our proposed system has 4,7% higher precision than the other systems (previous best precision was 0,21). However, there are arguments that could not be extracted, hence the recall and F1 of our proposed system is lower than the state-of-the-art system performance. text