Incorporating contexts to open information extraction
Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174529 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174529 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1745292024-05-03T02:58:52Z Incorporating contexts to open information extraction Dong, Kuicai Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Open information extraction Natural language processing Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE. Doctor of Philosophy 2024-04-01T06:00:06Z 2024-04-01T06:00:06Z 2024 Thesis-Doctor of Philosophy Dong, K. (2024). Incorporating contexts to open information extraction. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174529 https://hdl.handle.net/10356/174529 10.32657/10356/174529 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Open information extraction Natural language processing |
spellingShingle |
Computer and Information Science Open information extraction Natural language processing Dong, Kuicai Incorporating contexts to open information extraction |
description |
Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones.
In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations.
All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Dong, Kuicai |
format |
Thesis-Doctor of Philosophy |
author |
Dong, Kuicai |
author_sort |
Dong, Kuicai |
title |
Incorporating contexts to open information extraction |
title_short |
Incorporating contexts to open information extraction |
title_full |
Incorporating contexts to open information extraction |
title_fullStr |
Incorporating contexts to open information extraction |
title_full_unstemmed |
Incorporating contexts to open information extraction |
title_sort |
incorporating contexts to open information extraction |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174529 |
_version_ |
1800916356138270720 |