Knowledge graph construction from text

Open Information Extraction (OpenIE) has been the go-to tool for making sense and structuring of the otherwise unstructured text documents. The goal of an OpenIE system is to extract semantic triples (Subject-Relation->Object) from texts. Subject and Object in a semantic triple are typically enti...

Full description

Saved in:
Bibliographic Details
Main Author: Yong, Shan Jie
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153229
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Open Information Extraction (OpenIE) has been the go-to tool for making sense and structuring of the otherwise unstructured text documents. The goal of an OpenIE system is to extract semantic triples (Subject-Relation->Object) from texts. Subject and Object in a semantic triple are typically entities with Noun, Proper Noun, or Pronoun Part-of-Speech (POS) tag. In English texts, Proper Nouns, for example, are often referred to with Pronouns after its first mention. These substitutions undoubtedly ease written and verbal communication. However, in Information Extraction, it may result in ambiguity during semantic triple extraction. Pronouns may be seen as an independent entity from its antecedent. This project aims to resolve the aforementioned ambiguity by integrating OpenIE systems with Coreference Resolution, thereby allowing the extraction of relations between entities across the entire document. Additionally, across all coreference mention of an entity, there is one term among them that best represent the entity. Existing methods to identify this representative term include picking the longest term, or picking the first term. This project will experiment with methods that extract features of each coreference term in order to select the likeliest representative term, allowing for both anaphoric and cataphoric references to be resolved with a higher degree of certainty.