Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues r...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175994 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175994 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1759942024-05-18T16:45:49Z Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) Sara Johari Leonard Ng Wei Tat School of Materials Science and Engineering leonard.ngwt@ntu.edu.sg Engineering Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues related to reproducibility. Recently, the proliferation of artificial intelligence (AI) with ever-evolving large language models (LLMs), along with the emergence of the Retrieval Augmented Generation (RAG) approach, has demonstrated significant abilities to produce natural responses with a vast amount of knowledge. Therefore, there is an interest in combining the database of graphene synthesis with AI to remarkably assist in the research process. This experimental study tested the use of UMAP visualizations to determine the optimal chunk size and overlap. Subsequently, two LLMs, the DRAGON Deci-7B LLM and the DRAGON Mistral-7B LLM, were tested within a RAG question-answering chatbot architecture. The chatbots were then further evaluated with two advanced retrieval methods: the parent document retriever and the ensemble retriever. The chatbots were evaluated by RAGAs, a performance metric framework, with ChatGPT as a benchmark using a synthetic dataset of 10 questions and corresponding ground truths. Human evaluation was also conducted by manually inputting a user prompt into the chatbots and analysing the response generated. In summary, through LLM evaluations with ChatGPT, the optimal chatbot developed in this study utilized the DRAGON Mistral-7B LLM with the parent document retrieval method, with an embedded chunk size of 256 tokens and a 10% overlap. However, human evaluations raised concerns with regards to the actual useability of the chatbot. Further troubleshooting and refinement would be necessary, but this was constrained by the costs associated with the project. Bachelor's degree 2024-05-12T23:55:09Z 2024-05-12T23:55:09Z 2024 Final Year Project (FYP) Sara Johari (2024). Developing a graphene Q&A chatbot using retrieval augmented generation (RAG). Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175994 https://hdl.handle.net/10356/175994 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering |
spellingShingle |
Engineering Sara Johari Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
description |
Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues related to reproducibility.
Recently, the proliferation of artificial intelligence (AI) with ever-evolving large language models (LLMs), along with the emergence of the Retrieval Augmented Generation (RAG) approach, has demonstrated significant abilities to produce natural responses with a vast amount of knowledge. Therefore, there is an interest in combining the database of graphene synthesis with AI to remarkably assist in the research process.
This experimental study tested the use of UMAP visualizations to determine the optimal chunk size and overlap. Subsequently, two LLMs, the DRAGON Deci-7B LLM and the DRAGON Mistral-7B LLM, were tested within a RAG question-answering chatbot architecture. The chatbots were then further evaluated with two advanced retrieval methods: the parent document retriever and the ensemble retriever. The chatbots were evaluated by RAGAs, a performance metric framework, with ChatGPT as a benchmark using a synthetic dataset of 10 questions and corresponding ground truths. Human evaluation was also conducted by manually inputting a user prompt into the chatbots and analysing the response generated.
In summary, through LLM evaluations with ChatGPT, the optimal chatbot developed in this study utilized the DRAGON Mistral-7B LLM with the parent document retrieval method, with an embedded chunk size of 256 tokens and a 10% overlap. However, human evaluations raised concerns with regards to the actual useability of the chatbot. Further troubleshooting and refinement would be necessary, but this was constrained by the costs associated with the project. |
author2 |
Leonard Ng Wei Tat |
author_facet |
Leonard Ng Wei Tat Sara Johari |
format |
Final Year Project |
author |
Sara Johari |
author_sort |
Sara Johari |
title |
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
title_short |
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
title_full |
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
title_fullStr |
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
title_full_unstemmed |
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) |
title_sort |
developing a graphene q&a chatbot using retrieval augmented generation (rag) |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175994 |
_version_ |
1814047247083503616 |