Generating domain-specific paraphrases of questions from FAQ

This project will introduce a paraphrase generation system that will generate domain-specific paraphrases of the questions of Frequently Asked Question (FAQ) corpuses. This project aims to minimise the cost associated with the manual generation of paraphrases and performs effective data augmentation...

Full description

Saved in:

Bibliographic Details
Main Author:	Ng, Jing Rui
Other Authors:	Chng Eng Siong
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/148155
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-148155
record_format	dspace
spelling	sg-ntu-dr.10356-1481552021-04-24T06:23:54Z Generating domain-specific paraphrases of questions from FAQ Ng, Jing Rui Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering This project will introduce a paraphrase generation system that will generate domain-specific paraphrases of the questions of Frequently Asked Question (FAQ) corpuses. This project aims to minimise the cost associated with the manual generation of paraphrases and performs effective data augmentation to complement end-to-end FAQ retrieval that uses large language models by reducing overfitting. This is achieved by paying attention to several unique characteristics of the FAQ corpuses and through the use of two large language models, an off- domain labelled paraphrase dataset and abbreviations handling. The two language models used are T5 and Sentence Transformer. The approach proposed involves pre-processing, paraphrase generation, post-processing and candidate paraphrase selection. Firstly, T5 is used to fine-tune on the paraphrase dataset for the task of paraphrase generation. Secondly, abbreviations handling was incorporated into the pre-processing of the original question and post-processing of the generated paraphrase. Thirdly, Sentence Transformer Library is used for candidate paraphrase selection to ensure the semantic similarity of the paraphrase with the original question and the integrity of the paraphrase’s class label. Lastly, a GUI application is provided for users to generate paraphrases of questions from a FAQ dataset. From our experiments, we conclude that the pre-processing, post-processing and candidate paraphrase selection are effective in the successful generation of paraphrases and subsequent filtering of these paraphrases to output a set of high-quality domain-specific paraphrases for augmenting the FAQ corpuses. Bachelor of Engineering (Computer Science) 2021-04-24T06:23:54Z 2021-04-24T06:23:54Z 2021 Final Year Project (FYP) Ng, J. R. (2021). Generating domain-specific paraphrases of questions from FAQ. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/148155 https://hdl.handle.net/10356/148155 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Ng, Jing Rui Generating domain-specific paraphrases of questions from FAQ
description	This project will introduce a paraphrase generation system that will generate domain-specific paraphrases of the questions of Frequently Asked Question (FAQ) corpuses. This project aims to minimise the cost associated with the manual generation of paraphrases and performs effective data augmentation to complement end-to-end FAQ retrieval that uses large language models by reducing overfitting. This is achieved by paying attention to several unique characteristics of the FAQ corpuses and through the use of two large language models, an off- domain labelled paraphrase dataset and abbreviations handling. The two language models used are T5 and Sentence Transformer. The approach proposed involves pre-processing, paraphrase generation, post-processing and candidate paraphrase selection. Firstly, T5 is used to fine-tune on the paraphrase dataset for the task of paraphrase generation. Secondly, abbreviations handling was incorporated into the pre-processing of the original question and post-processing of the generated paraphrase. Thirdly, Sentence Transformer Library is used for candidate paraphrase selection to ensure the semantic similarity of the paraphrase with the original question and the integrity of the paraphrase’s class label. Lastly, a GUI application is provided for users to generate paraphrases of questions from a FAQ dataset. From our experiments, we conclude that the pre-processing, post-processing and candidate paraphrase selection are effective in the successful generation of paraphrases and subsequent filtering of these paraphrases to output a set of high-quality domain-specific paraphrases for augmenting the FAQ corpuses.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Ng, Jing Rui
format	Final Year Project
author	Ng, Jing Rui
author_sort	Ng, Jing Rui
title	Generating domain-specific paraphrases of questions from FAQ
title_short	Generating domain-specific paraphrases of questions from FAQ
title_full	Generating domain-specific paraphrases of questions from FAQ
title_fullStr	Generating domain-specific paraphrases of questions from FAQ
title_full_unstemmed	Generating domain-specific paraphrases of questions from FAQ
title_sort	generating domain-specific paraphrases of questions from faq
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/148155
_version_	1698713742665056256

Generating domain-specific paraphrases of questions from FAQ

Similar Items