Incorporating contexts to open information extraction

Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit...

Full description

Saved in:

Bibliographic Details
Main Author:	Dong, Kuicai
Other Authors:	Sun Aixin
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Open information extraction Natural language processing
Online Access:	https://hdl.handle.net/10356/174529
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174529
record_format	dspace
spelling	sg-ntu-dr.10356-1745292024-05-03T02:58:52Z Incorporating contexts to open information extraction Dong, Kuicai Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Open information extraction Natural language processing Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE. Doctor of Philosophy 2024-04-01T06:00:06Z 2024-04-01T06:00:06Z 2024 Thesis-Doctor of Philosophy Dong, K. (2024). Incorporating contexts to open information extraction. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174529 https://hdl.handle.net/10356/174529 10.32657/10356/174529 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Open information extraction Natural language processing
spellingShingle	Computer and Information Science Open information extraction Natural language processing Dong, Kuicai Incorporating contexts to open information extraction
description	Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE.
author2	Sun Aixin
author_facet	Sun Aixin Dong, Kuicai
format	Thesis-Doctor of Philosophy
author	Dong, Kuicai
author_sort	Dong, Kuicai
title	Incorporating contexts to open information extraction
title_short	Incorporating contexts to open information extraction
title_full	Incorporating contexts to open information extraction
title_fullStr	Incorporating contexts to open information extraction
title_full_unstemmed	Incorporating contexts to open information extraction
title_sort	incorporating contexts to open information extraction
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/174529
_version_	1800916356138270720

Incorporating contexts to open information extraction

Similar Items