Knowledge graph construction from text
Open Information Extraction (OpenIE) has been the go-to tool for making sense and structuring of the otherwise unstructured text documents. The goal of an OpenIE system is to extract semantic triples (Subject-Relation->Object) from texts. Subject and Object in a semantic triple are typically enti...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153229 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Open Information Extraction (OpenIE) has been the go-to tool for making sense and structuring of the otherwise unstructured text documents. The goal of an OpenIE system is to extract semantic triples (Subject-Relation->Object) from texts. Subject and Object in a semantic triple are typically entities with Noun, Proper Noun, or Pronoun Part-of-Speech (POS) tag. In English texts, Proper Nouns, for example, are often referred to with Pronouns after its first mention. These substitutions undoubtedly ease written and verbal communication. However, in Information Extraction, it may result in ambiguity during semantic triple extraction. Pronouns may be seen as an independent entity from its antecedent. This project aims to resolve the aforementioned ambiguity by integrating OpenIE systems with Coreference Resolution, thereby allowing the extraction of relations between entities across the entire document. Additionally, across all coreference mention of an entity, there is one term among them that best represent the entity. Existing methods to identify this representative term include picking the longest term, or picking the first term. This project will experiment with methods that extract features of each coreference term in order to select the likeliest representative term, allowing for both anaphoric and cataphoric references to be resolved with a higher degree of certainty. |
---|