Development of an API for EDU segmentation

EDU stands for elementary discourse unit, which is a clause-like structure in a sentence. EDU segmentation, refers to determining the boundaries to split sentences into multiple EDUs. This project aims to experiment and develop EDU segmentation models. The experiments are conducted using the Rhetori...

Full description

Saved in:

Bibliographic Details
Main Author:	Liu, Qingyi
Other Authors:	Sun Aixin
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/166098
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-166098
record_format	dspace
spelling	sg-ntu-dr.10356-1660982023-04-21T15:38:38Z Development of an API for EDU segmentation Liu, Qingyi Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Engineering::Computer science and engineering EDU stands for elementary discourse unit, which is a clause-like structure in a sentence. EDU segmentation, refers to determining the boundaries to split sentences into multiple EDUs. This project aims to experiment and develop EDU segmentation models. The experiments are conducted using the Rhetorical Structure Theory (RST) dataset and the model performance is evaluated using the F1-score based on the token level EDU boundaries. The current existing research model, Segbot, has a Seq2seq model architecture using a bi-GRU encoder and GRU decoder with a pointer network to select the boundaries for EDU segmentation. To improve Segbot, we proposed replacing the bi-GRU encoder in Segbot with the generative pretrained BART encoder. This model performed at 94.5% F1-score. Token classification for EDU segmentation based on the boundaries is also explored. This is done by finetuning pretrained models such as BERT as well as using the PosTag embeddings as additional input features. Segbot with BART encoder yielded the highest performance and hence, the model weights would be used to develop an API Python Library in the future. This library would improve ease of usage for EDU segmentation on downstream NLP tasks, such as sentiment analysis and question answering. Bachelor of Engineering (Computer Science) 2023-04-21T07:07:48Z 2023-04-21T07:07:48Z 2023 Final Year Project (FYP) Liu, Q. (2023). Development of an API for EDU segmentation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166098 https://hdl.handle.net/10356/166098 en SCSE22-0190 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Liu, Qingyi Development of an API for EDU segmentation
description	EDU stands for elementary discourse unit, which is a clause-like structure in a sentence. EDU segmentation, refers to determining the boundaries to split sentences into multiple EDUs. This project aims to experiment and develop EDU segmentation models. The experiments are conducted using the Rhetorical Structure Theory (RST) dataset and the model performance is evaluated using the F1-score based on the token level EDU boundaries. The current existing research model, Segbot, has a Seq2seq model architecture using a bi-GRU encoder and GRU decoder with a pointer network to select the boundaries for EDU segmentation. To improve Segbot, we proposed replacing the bi-GRU encoder in Segbot with the generative pretrained BART encoder. This model performed at 94.5% F1-score. Token classification for EDU segmentation based on the boundaries is also explored. This is done by finetuning pretrained models such as BERT as well as using the PosTag embeddings as additional input features. Segbot with BART encoder yielded the highest performance and hence, the model weights would be used to develop an API Python Library in the future. This library would improve ease of usage for EDU segmentation on downstream NLP tasks, such as sentiment analysis and question answering.
author2	Sun Aixin
author_facet	Sun Aixin Liu, Qingyi
format	Final Year Project
author	Liu, Qingyi
author_sort	Liu, Qingyi
title	Development of an API for EDU segmentation
title_short	Development of an API for EDU segmentation
title_full	Development of an API for EDU segmentation
title_fullStr	Development of an API for EDU segmentation
title_full_unstemmed	Development of an API for EDU segmentation
title_sort	development of an api for edu segmentation
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/166098
_version_	1764208175053012992

Development of an API for EDU segmentation

Similar Items