EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES

Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant wor...

全面介紹

Saved in:

書目詳細資料
主要作者:	Romadhony, Ade
格式:	Dissertations
語言:	Indonesia
在線閱讀:	https://digilib.itb.ac.id/gdl/view/36671
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Institut Teknologi Bandung
語言:	Indonesia

id	id-itb.:36671
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant word co-occurrence information that describe particular event types. The approach performs well when it is applied on large document collection, and the results has been tested on related task, such as event argument extraction. However, there are several conditions where large size document requirement can not be fulfilled, hence information from other sources such as knowledge bases become useful. By employing knowledge bases that contain word semantic relatedness information, we can obtain additional knowledge to gather event-related words. In this work, to generate the event schema, we use the Open Information Extraction (Open IE) extraction results, usually called relation tuples. Open IE is an information extraction paradigm that applies minimal restriction on extracted information types. An Open IE relation tuple has a structure that consists of relation/trigger and arguments, and is similar to event structure model that commonly employed by several previous works on automatically generating event schemas. Open IE relation tuple as an intermediate structure has also been tested on several tasks, and has better performance than other structures, especially on semantic relatedness task. The contribution of this research lies in the development of the method of clustering the relation tuple with the use of external knowledge bases and the development of methods to improve the quality of the Open IE extraction results in the preprocessing stage of the input and the addition of extraction rules. The grouping of relation tuples based on semantic linkages will produce a scheme that can be used as a template for information extraction. The method proposed for grouping relation tuples by emphasizing semantic similarities that do not depend on the information obtained from redundancy in the document, because the method cannot always collect tuple relations with high semantic similarities in the same group, especially if the IE Open extraction results are lacking complete and contains noise. The definition of grouping method is based on several variations in similarity calculations, among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger corpus statistics. The use of an external knowledge base is also carried out in the constrained clustering process, and the filtering argument with certain event semantics. Before the process of grouping tuple relations is done, it is necessary to evaluate the quality of the IE Open extraction results. Based on the examination on the extraction results of the existing Open IE system, there is an opportunity to improve the quality of extraction results in terms of accuracy and completeness. To improve the accuracy and completeness of the Open IE extraction results, pre-processing was carried out on the Open IE system input and modification of extraction rules. The pre-processing of the IE Open input sentence is done by simplifying the sentence using rule-based methods with punctuation features, POSTag, and phrase types. The method has a low level of complexity when compared to the use of more complicated features such as the type of dependency, but has equivalent performance. While in the addition of relation extraction rules, new rules are obtained from the learning process by using the decision tree method. The feature proposed in the addition of extraction rules is the second level type dependency feature. With the extraction rules with dependency type features that are not limited to direct connection, it is proven to increase the number of relevant relations that can be extracted. Schema evaluation resulting from grouping of relation tuples is done by testing the task of identifying and extracting event arguments. The test results show that the scheme built can be used to extract event arguments on the standard Open English Extraction (ASTRE) dataset, and its performance, which is indicated by the values of precision, recall, and F1 increases, with an increase in F1 reaching 46% from 0,13 where no konowledge bases were involved. We also compared the system performance to the state-of-the-art system performances, and the argument extraction result shows that our proposed system has 4,7% higher precision than the other systems (previous best precision was 0,21). However, there are arguments that could not be extracted, hence the recall and F1 of our proposed system is lower than the state-of-the-art system performance.
format	Dissertations
author	Romadhony, Ade
spellingShingle	Romadhony, Ade EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
author_facet	Romadhony, Ade
author_sort	Romadhony, Ade
title	EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_short	EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_full	EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_fullStr	EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_full_unstemmed	EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES
title_sort	event schema generation based on open ie relation tuples using knowledge bases
url	https://digilib.itb.ac.id/gdl/view/36671
_version_	1823637977557893120
spelling	id-itb.:366712019-03-14T10:52:47ZEVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES Romadhony, Ade Indonesia Dissertations event schema; Open IE; relation tuple; knowledge base; clustering; sentence simplification; Open IE extraction rules learning; INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/36671 Performing event extraction by using a manually predefined the event schema/template requires much efforts. Therefore, several works on event extraction proposed methods to generate the event schema automatically. A common approach used in automatic schema generation is by employing redundant word co-occurrence information that describe particular event types. The approach performs well when it is applied on large document collection, and the results has been tested on related task, such as event argument extraction. However, there are several conditions where large size document requirement can not be fulfilled, hence information from other sources such as knowledge bases become useful. By employing knowledge bases that contain word semantic relatedness information, we can obtain additional knowledge to gather event-related words. In this work, to generate the event schema, we use the Open Information Extraction (Open IE) extraction results, usually called relation tuples. Open IE is an information extraction paradigm that applies minimal restriction on extracted information types. An Open IE relation tuple has a structure that consists of relation/trigger and arguments, and is similar to event structure model that commonly employed by several previous works on automatically generating event schemas. Open IE relation tuple as an intermediate structure has also been tested on several tasks, and has better performance than other structures, especially on semantic relatedness task. The contribution of this research lies in the development of the method of clustering the relation tuple with the use of external knowledge bases and the development of methods to improve the quality of the Open IE extraction results in the preprocessing stage of the input and the addition of extraction rules. The grouping of relation tuples based on semantic linkages will produce a scheme that can be used as a template for information extraction. The method proposed for grouping relation tuples by emphasizing semantic similarities that do not depend on the information obtained from redundancy in the document, because the method cannot always collect tuple relations with high semantic similarities in the same group, especially if the IE Open extraction results are lacking complete and contains noise. The definition of grouping method is based on several variations in similarity calculations, among others based on: statistics on the appearance of words simultane ously, similarity values of WordNet knowledge base, and similarity values of larger corpus statistics. The use of an external knowledge base is also carried out in the constrained clustering process, and the filtering argument with certain event semantics. Before the process of grouping tuple relations is done, it is necessary to evaluate the quality of the IE Open extraction results. Based on the examination on the extraction results of the existing Open IE system, there is an opportunity to improve the quality of extraction results in terms of accuracy and completeness. To improve the accuracy and completeness of the Open IE extraction results, pre-processing was carried out on the Open IE system input and modification of extraction rules. The pre-processing of the IE Open input sentence is done by simplifying the sentence using rule-based methods with punctuation features, POSTag, and phrase types. The method has a low level of complexity when compared to the use of more complicated features such as the type of dependency, but has equivalent performance. While in the addition of relation extraction rules, new rules are obtained from the learning process by using the decision tree method. The feature proposed in the addition of extraction rules is the second level type dependency feature. With the extraction rules with dependency type features that are not limited to direct connection, it is proven to increase the number of relevant relations that can be extracted. Schema evaluation resulting from grouping of relation tuples is done by testing the task of identifying and extracting event arguments. The test results show that the scheme built can be used to extract event arguments on the standard Open English Extraction (ASTRE) dataset, and its performance, which is indicated by the values of precision, recall, and F1 increases, with an increase in F1 reaching 46% from 0,13 where no konowledge bases were involved. We also compared the system performance to the state-of-the-art system performances, and the argument extraction result shows that our proposed system has 4,7% higher precision than the other systems (previous best precision was 0,21). However, there are arguments that could not be extracted, hence the recall and F1 of our proposed system is lower than the state-of-the-art system performance. text

EVENT SCHEMA GENERATION BASED ON OPEN IE RELATION TUPLES USING KNOWLEDGE BASES

相似書籍