Document level relationship extraction

The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of...

Full description

Saved in:
Bibliographic Details
Main Author: Leong, Marcus Yu Zhen
Other Authors: Lihui Chen
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167545
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167545
record_format dspace
spelling sg-ntu-dr.10356-1675452023-07-07T15:44:53Z Document level relationship extraction Leong, Marcus Yu Zhen Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of context from multiple sentences. The current baseline method uses a BiLSTM model to encode the entire document. However, the BiLSTM model must be trained from scratch and will not be able to accurately capture the intricacies between entities when trained only on the given dataset. To properly capture the context of the interaction, we propose incorporating a state-of-the-art RoBERTa-Large model, a variant of BERT that is already pretrained on a corpus that is a magnitude larger than the original corpus and further finetuned with the dataset. Additionally, we will be incorporating the concept of limiting the input into the encoder to only three sentences rather than the whole document as a recent study proved that most Entity Relationships (ER) can be inferred using only context from three sentences of a document. The result of implementing the proposed changes leads to a reduction in the memory required to process the input, increase the accuracy of the predicted ER and improve the transferability of the model when provided with input from a domain not found in the training corpus. Bachelor of Engineering (Information Engineering and Media) 2023-05-29T05:32:34Z 2023-05-29T05:32:34Z 2023 Final Year Project (FYP) Leong, M. Y. Z. (2023). Document level relationship extraction. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167545 https://hdl.handle.net/10356/167545 en A3062-221 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Leong, Marcus Yu Zhen
Document level relationship extraction
description The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of context from multiple sentences. The current baseline method uses a BiLSTM model to encode the entire document. However, the BiLSTM model must be trained from scratch and will not be able to accurately capture the intricacies between entities when trained only on the given dataset. To properly capture the context of the interaction, we propose incorporating a state-of-the-art RoBERTa-Large model, a variant of BERT that is already pretrained on a corpus that is a magnitude larger than the original corpus and further finetuned with the dataset. Additionally, we will be incorporating the concept of limiting the input into the encoder to only three sentences rather than the whole document as a recent study proved that most Entity Relationships (ER) can be inferred using only context from three sentences of a document. The result of implementing the proposed changes leads to a reduction in the memory required to process the input, increase the accuracy of the predicted ER and improve the transferability of the model when provided with input from a domain not found in the training corpus.
author2 Lihui Chen
author_facet Lihui Chen
Leong, Marcus Yu Zhen
format Final Year Project
author Leong, Marcus Yu Zhen
author_sort Leong, Marcus Yu Zhen
title Document level relationship extraction
title_short Document level relationship extraction
title_full Document level relationship extraction
title_fullStr Document level relationship extraction
title_full_unstemmed Document level relationship extraction
title_sort document level relationship extraction
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167545
_version_ 1772828274998640640