EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant wor...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/36671 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:36671 |
---|---|
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Performing event extraction by using a manually predefined the event schema/template
requires much efforts. Therefore, several works on event extraction proposed
methods to generate the event schema automatically. A common approach used in
automatic schema generation is by employing redundant word co-occurrence information
that describe particular event types. The approach performs well when it
is applied on large document collection, and the results has been tested on related
task, such as event argument extraction. However, there are several conditions
where large size document requirement can not be fulfilled, hence information from
other sources such as knowledge bases become useful. By employing knowledge
bases that contain word semantic relatedness information, we can obtain additional
knowledge to gather event-related words.
In this work, to generate the event schema, we use the Open Information Extraction
(Open IE) extraction results, usually called relation tuples. Open IE is an
information extraction paradigm that applies minimal restriction on extracted
information types. An Open IE relation tuple has a structure that consists
of relation/trigger and arguments, and is similar to event structure model that
commonly employed by several previous works on automatically generating event
schemas. Open IE relation tuple as an intermediate structure has also been tested
on several tasks, and has better performance than other structures, especially on
semantic relatedness task.
The contribution of this research lies in the development of the method of clustering
the relation tuple with the use of external knowledge bases and the development of
methods to improve the quality of the Open IE extraction results in the preprocessing
stage of the input and the addition of extraction rules. The grouping of relation
tuples based on semantic linkages will produce a scheme that can be used as a
template for information extraction. The method proposed for grouping relation
tuples by emphasizing semantic similarities that do not depend on the information
obtained from redundancy in the document, because the method cannot always
collect tuple relations with high semantic similarities in the same group, especially
if the IE Open extraction results are lacking complete and contains noise. The
definition of grouping method is based on several variations in similarity calculations,
among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger
corpus statistics. The use of an external knowledge base is also carried out in
the constrained clustering process, and the filtering argument with certain event
semantics.
Before the process of grouping tuple relations is done, it is necessary to evaluate
the quality of the IE Open extraction results. Based on the examination on the
extraction results of the existing Open IE system, there is an opportunity to improve
the quality of extraction results in terms of accuracy and completeness. To improve
the accuracy and completeness of the Open IE extraction results, pre-processing
was carried out on the Open IE system input and modification of extraction rules.
The pre-processing of the IE Open input sentence is done by simplifying the sentence
using rule-based methods with punctuation features, POSTag, and phrase types.
The method has a low level of complexity when compared to the use of more complicated
features such as the type of dependency, but has equivalent performance.
While in the addition of relation extraction rules, new rules are obtained from
the learning process by using the decision tree method. The feature proposed in
the addition of extraction rules is the second level type dependency feature. With
the extraction rules with dependency type features that are not limited to direct
connection, it is proven to increase the number of relevant relations that can be
extracted.
Schema evaluation resulting from grouping of relation tuples is done by testing
the task of identifying and extracting event arguments. The test results show that
the scheme built can be used to extract event arguments on the standard Open
English Extraction (ASTRE) dataset, and its performance, which is indicated by
the values of precision, recall, and F1 increases, with an increase in F1 reaching
46% from 0,13 where no konowledge bases were involved. We also compared the
system performance to the state-of-the-art system performances, and the argument
extraction result shows that our proposed system has 4,7% higher precision than
the other systems (previous best precision was 0,21). However, there are arguments
that could not be extracted, hence the recall and F1 of our proposed system is lower
than the state-of-the-art system performance.
|
format |
Dissertations |
author |
Romadhony, Ade |
spellingShingle |
Romadhony, Ade EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
author_facet |
Romadhony, Ade |
author_sort |
Romadhony, Ade |
title |
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
title_short |
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
title_full |
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
title_fullStr |
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
title_full_unstemmed |
EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES |
title_sort |
event schema generation based on open ie relation tuples using knowledge bases |
url |
https://digilib.itb.ac.id/gdl/view/36671 |
_version_ |
1822268740347101184 |
spelling |
id-itb.:366712019-03-14T10:52:47ZEVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES Romadhony, Ade Indonesia Dissertations event schema; Open IE; relation tuple; knowledge base; clustering; sentence simplification; Open IE extraction rules learning; INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/36671 Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant word co-occurrence information that describe particular event types. The approach performs well when it is applied on large document collection, and the results has been tested on related task, such as event argument extraction. However, there are several conditions where large size document requirement can not be fulfilled, hence information from other sources such as knowledge bases become useful. By employing knowledge bases that contain word semantic relatedness information, we can obtain additional knowledge to gather event-related words. In this work, to generate the event schema, we use the Open Information Extraction (Open IE) extraction results, usually called relation tuples. Open IE is an information extraction paradigm that applies minimal restriction on extracted information types. An Open IE relation tuple has a structure that consists of relation/trigger and arguments, and is similar to event structure model that commonly employed by several previous works on automatically generating event schemas. Open IE relation tuple as an intermediate structure has also been tested on several tasks, and has better performance than other structures, especially on semantic relatedness task. The contribution of this research lies in the development of the method of clustering the relation tuple with the use of external knowledge bases and the development of methods to improve the quality of the Open IE extraction results in the preprocessing stage of the input and the addition of extraction rules. The grouping of relation tuples based on semantic linkages will produce a scheme that can be used as a template for information extraction. The method proposed for grouping relation tuples by emphasizing semantic similarities that do not depend on the information obtained from redundancy in the document, because the method cannot always collect tuple relations with high semantic similarities in the same group, especially if the IE Open extraction results are lacking complete and contains noise. The definition of grouping method is based on several variations in similarity calculations, among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger corpus statistics. The use of an external knowledge base is also carried out in the constrained clustering process, and the filtering argument with certain event semantics. Before the process of grouping tuple relations is done, it is necessary to evaluate the quality of the IE Open extraction results. Based on the examination on the extraction results of the existing Open IE system, there is an opportunity to improve the quality of extraction results in terms of accuracy and completeness. To improve the accuracy and completeness of the Open IE extraction results, pre-processing was carried out on the Open IE system input and modification of extraction rules. The pre-processing of the IE Open input sentence is done by simplifying the sentence using rule-based methods with punctuation features, POSTag, and phrase types. The method has a low level of complexity when compared to the use of more complicated features such as the type of dependency, but has equivalent performance. While in the addition of relation extraction rules, new rules are obtained from the learning process by using the decision tree method. The feature proposed in the addition of extraction rules is the second level type dependency feature. With the extraction rules with dependency type features that are not limited to direct connection, it is proven to increase the number of relevant relations that can be extracted. Schema evaluation resulting from grouping of relation tuples is done by testing the task of identifying and extracting event arguments. The test results show that the scheme built can be used to extract event arguments on the standard Open English Extraction (ASTRE) dataset, and its performance, which is indicated by the values of precision, recall, and F1 increases, with an increase in F1 reaching 46% from 0,13 where no konowledge bases were involved. We also compared the system performance to the state-of-the-art system performances, and the argument extraction result shows that our proposed system has 4,7% higher precision than the other systems (previous best precision was 0,21). However, there are arguments that could not be extracted, hence the recall and F1 of our proposed system is lower than the state-of-the-art system performance. text |