Document level relationship extraction
The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/167545 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-167545 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1675452023-07-07T15:44:53Z Document level relationship extraction Leong, Marcus Yu Zhen Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of context from multiple sentences. The current baseline method uses a BiLSTM model to encode the entire document. However, the BiLSTM model must be trained from scratch and will not be able to accurately capture the intricacies between entities when trained only on the given dataset. To properly capture the context of the interaction, we propose incorporating a state-of-the-art RoBERTa-Large model, a variant of BERT that is already pretrained on a corpus that is a magnitude larger than the original corpus and further finetuned with the dataset. Additionally, we will be incorporating the concept of limiting the input into the encoder to only three sentences rather than the whole document as a recent study proved that most Entity Relationships (ER) can be inferred using only context from three sentences of a document. The result of implementing the proposed changes leads to a reduction in the memory required to process the input, increase the accuracy of the predicted ER and improve the transferability of the model when provided with input from a domain not found in the training corpus. Bachelor of Engineering (Information Engineering and Media) 2023-05-29T05:32:34Z 2023-05-29T05:32:34Z 2023 Final Year Project (FYP) Leong, M. Y. Z. (2023). Document level relationship extraction. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167545 https://hdl.handle.net/10356/167545 en A3062-221 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Leong, Marcus Yu Zhen Document level relationship extraction |
description |
The process of Document Level Relationship Extraction (RE) consists of inputting multiple sentences into an RE model to output a relationship between entities that otherwise cannot be determined using context from only a single sentence. This is a more challenging task as it requires the analysis of context from multiple sentences.
The current baseline method uses a BiLSTM model to encode the entire document. However, the BiLSTM model must be trained from scratch and will not be able to accurately capture the intricacies between entities when trained only on the given dataset. To properly capture the context of the interaction, we propose incorporating a state-of-the-art RoBERTa-Large model, a variant of BERT that is already pretrained on a corpus that is a magnitude larger than the original corpus and further finetuned with the dataset.
Additionally, we will be incorporating the concept of limiting the input into the encoder to only three sentences rather than the whole document as a recent study proved that most Entity Relationships (ER) can be inferred using only context from three sentences of a document.
The result of implementing the proposed changes leads to a reduction in the memory required to process the input, increase the accuracy of the predicted ER and improve the transferability of the model when provided with input from a domain not found in the training corpus. |
author2 |
Lihui Chen |
author_facet |
Lihui Chen Leong, Marcus Yu Zhen |
format |
Final Year Project |
author |
Leong, Marcus Yu Zhen |
author_sort |
Leong, Marcus Yu Zhen |
title |
Document level relationship extraction |
title_short |
Document level relationship extraction |
title_full |
Document level relationship extraction |
title_fullStr |
Document level relationship extraction |
title_full_unstemmed |
Document level relationship extraction |
title_sort |
document level relationship extraction |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/167545 |
_version_ |
1772828274998640640 |