A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary a...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2023
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etdm_ece/28 https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etdm_ece-1028 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etdm_ece-10282023-10-01T23:56:15Z A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data Roque, Matthew Theodore C. The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary and most likely do not have a large amount of data or computational resources to properly train a complex natural language processing (NLP) model. This paper proposes a method for creating a domain-specific virtual assistant using Generative Pre-Trained Transformer-3 (GPT-3) to generate paraphrases on a relatively small dataset, and a Sentence Transformer (SBERT) model with a distilled version of BERT (DistilBERT) base, pretrained on the Quora Question Pairs dataset, and fine-tuned on the augmented dataset. This method of creating a model is evaluated on the MS MARCO, SemEval, and PubMed datasets using mean average precision (MAP), precision at k (P@k), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR) as performance metrics. The method was also demonstrated using a small dataset of 188 frequently asked questions from the De La Salle University website that also includes domain-specific vocabulary. The implementation of the fine-tuned model was demonstrated on a simple webpage and the results were found to be satisfactory. 2023-08-05T07:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_ece/28 https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf Electronics And Communications Engineering Master's Theses English Animo Repository Chatbots Natural language processing (Computer science) Human-computer interaction Electrical and Computer Engineering |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Chatbots Natural language processing (Computer science) Human-computer interaction Electrical and Computer Engineering |
spellingShingle |
Chatbots Natural language processing (Computer science) Human-computer interaction Electrical and Computer Engineering Roque, Matthew Theodore C. A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
description |
The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary and most likely do not have a large amount of data or computational resources to properly train a complex natural language processing (NLP) model. This paper proposes a method for creating a domain-specific virtual assistant using Generative Pre-Trained Transformer-3 (GPT-3) to generate paraphrases on a relatively small dataset, and a Sentence Transformer (SBERT) model with a distilled version of BERT (DistilBERT) base, pretrained on the Quora Question Pairs dataset, and fine-tuned on the augmented dataset. This method of creating a model is evaluated on the MS MARCO, SemEval, and PubMed datasets using mean average precision (MAP), precision at k (P@k), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR) as performance metrics. The method was also demonstrated using a small dataset of 188 frequently asked questions from the De La Salle University website that also includes domain-specific vocabulary. The implementation of the fine-tuned model was demonstrated on a simple webpage and the results were found to be satisfactory. |
format |
text |
author |
Roque, Matthew Theodore C. |
author_facet |
Roque, Matthew Theodore C. |
author_sort |
Roque, Matthew Theodore C. |
title |
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
title_short |
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
title_full |
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
title_fullStr |
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
title_full_unstemmed |
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data |
title_sort |
domain specific virtual assistant using paraphrase generation for data augmentation and ssentence transformers on limited data |
publisher |
Animo Repository |
publishDate |
2023 |
url |
https://animorepository.dlsu.edu.ph/etdm_ece/28 https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf |
_version_ |
1779260465175592960 |