A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data

The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary a...

Full description

Saved in:
Bibliographic Details
Main Author: Roque, Matthew Theodore C.
Format: text
Language:English
Published: Animo Repository 2023
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdm_ece/28
https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etdm_ece-1028
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etdm_ece-10282023-10-01T23:56:15Z A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data Roque, Matthew Theodore C. The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary and most likely do not have a large amount of data or computational resources to properly train a complex natural language processing (NLP) model. This paper proposes a method for creating a domain-specific virtual assistant using Generative Pre-Trained Transformer-3 (GPT-3) to generate paraphrases on a relatively small dataset, and a Sentence Transformer (SBERT) model with a distilled version of BERT (DistilBERT) base, pretrained on the Quora Question Pairs dataset, and fine-tuned on the augmented dataset. This method of creating a model is evaluated on the MS MARCO, SemEval, and PubMed datasets using mean average precision (MAP), precision at k (P@k), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR) as performance metrics. The method was also demonstrated using a small dataset of 188 frequently asked questions from the De La Salle University website that also includes domain-specific vocabulary. The implementation of the fine-tuned model was demonstrated on a simple webpage and the results were found to be satisfactory. 2023-08-05T07:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_ece/28 https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf Electronics And Communications Engineering Master's Theses English Animo Repository Chatbots Natural language processing (Computer science) Human-computer interaction Electrical and Computer Engineering
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Chatbots
Natural language processing (Computer science)
Human-computer interaction
Electrical and Computer Engineering
spellingShingle Chatbots
Natural language processing (Computer science)
Human-computer interaction
Electrical and Computer Engineering
Roque, Matthew Theodore C.
A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
description The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary and most likely do not have a large amount of data or computational resources to properly train a complex natural language processing (NLP) model. This paper proposes a method for creating a domain-specific virtual assistant using Generative Pre-Trained Transformer-3 (GPT-3) to generate paraphrases on a relatively small dataset, and a Sentence Transformer (SBERT) model with a distilled version of BERT (DistilBERT) base, pretrained on the Quora Question Pairs dataset, and fine-tuned on the augmented dataset. This method of creating a model is evaluated on the MS MARCO, SemEval, and PubMed datasets using mean average precision (MAP), precision at k (P@k), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR) as performance metrics. The method was also demonstrated using a small dataset of 188 frequently asked questions from the De La Salle University website that also includes domain-specific vocabulary. The implementation of the fine-tuned model was demonstrated on a simple webpage and the results were found to be satisfactory.
format text
author Roque, Matthew Theodore C.
author_facet Roque, Matthew Theodore C.
author_sort Roque, Matthew Theodore C.
title A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
title_short A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
title_full A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
title_fullStr A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
title_full_unstemmed A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data
title_sort domain specific virtual assistant using paraphrase generation for data augmentation and ssentence transformers on limited data
publisher Animo Repository
publishDate 2023
url https://animorepository.dlsu.edu.ph/etdm_ece/28
https://animorepository.dlsu.edu.ph/context/etdm_ece/article/1028/viewcontent/A_Domain2_Specific_Virtual_Assistant_Using_Paraphrase_Generation_f.pdf
_version_ 1779260465175592960