Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)

Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues r...

Full description

Saved in:
Bibliographic Details
Main Author: Sara Johari
Other Authors: Leonard Ng Wei Tat
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175994
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175994
record_format dspace
spelling sg-ntu-dr.10356-1759942024-05-18T16:45:49Z Developing a graphene Q&A chatbot using retrieval augmented generation (RAG) Sara Johari Leonard Ng Wei Tat School of Materials Science and Engineering leonard.ngwt@ntu.edu.sg Engineering Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues related to reproducibility. Recently, the proliferation of artificial intelligence (AI) with ever-evolving large language models (LLMs), along with the emergence of the Retrieval Augmented Generation (RAG) approach, has demonstrated significant abilities to produce natural responses with a vast amount of knowledge. Therefore, there is an interest in combining the database of graphene synthesis with AI to remarkably assist in the research process. This experimental study tested the use of UMAP visualizations to determine the optimal chunk size and overlap. Subsequently, two LLMs, the DRAGON Deci-7B LLM and the DRAGON Mistral-7B LLM, were tested within a RAG question-answering chatbot architecture. The chatbots were then further evaluated with two advanced retrieval methods: the parent document retriever and the ensemble retriever. The chatbots were evaluated by RAGAs, a performance metric framework, with ChatGPT as a benchmark using a synthetic dataset of 10 questions and corresponding ground truths. Human evaluation was also conducted by manually inputting a user prompt into the chatbots and analysing the response generated. In summary, through LLM evaluations with ChatGPT, the optimal chatbot developed in this study utilized the DRAGON Mistral-7B LLM with the parent document retrieval method, with an embedded chunk size of 256 tokens and a 10% overlap. However, human evaluations raised concerns with regards to the actual useability of the chatbot. Further troubleshooting and refinement would be necessary, but this was constrained by the costs associated with the project. Bachelor's degree 2024-05-12T23:55:09Z 2024-05-12T23:55:09Z 2024 Final Year Project (FYP) Sara Johari (2024). Developing a graphene Q&A chatbot using retrieval augmented generation (RAG). Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175994 https://hdl.handle.net/10356/175994 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
spellingShingle Engineering
Sara Johari
Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
description Graphene synthesis is a rapidly growing market with various methods for different applications. However, the mass production of high-quality graphene that is cost-effective and environmentally sustainable has not been established commercially. Current graphene synthesis techniques also face issues related to reproducibility. Recently, the proliferation of artificial intelligence (AI) with ever-evolving large language models (LLMs), along with the emergence of the Retrieval Augmented Generation (RAG) approach, has demonstrated significant abilities to produce natural responses with a vast amount of knowledge. Therefore, there is an interest in combining the database of graphene synthesis with AI to remarkably assist in the research process. This experimental study tested the use of UMAP visualizations to determine the optimal chunk size and overlap. Subsequently, two LLMs, the DRAGON Deci-7B LLM and the DRAGON Mistral-7B LLM, were tested within a RAG question-answering chatbot architecture. The chatbots were then further evaluated with two advanced retrieval methods: the parent document retriever and the ensemble retriever. The chatbots were evaluated by RAGAs, a performance metric framework, with ChatGPT as a benchmark using a synthetic dataset of 10 questions and corresponding ground truths. Human evaluation was also conducted by manually inputting a user prompt into the chatbots and analysing the response generated. In summary, through LLM evaluations with ChatGPT, the optimal chatbot developed in this study utilized the DRAGON Mistral-7B LLM with the parent document retrieval method, with an embedded chunk size of 256 tokens and a 10% overlap. However, human evaluations raised concerns with regards to the actual useability of the chatbot. Further troubleshooting and refinement would be necessary, but this was constrained by the costs associated with the project.
author2 Leonard Ng Wei Tat
author_facet Leonard Ng Wei Tat
Sara Johari
format Final Year Project
author Sara Johari
author_sort Sara Johari
title Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
title_short Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
title_full Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
title_fullStr Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
title_full_unstemmed Developing a graphene Q&A chatbot using retrieval augmented generation (RAG)
title_sort developing a graphene q&a chatbot using retrieval augmented generation (rag)
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175994
_version_ 1814047247083503616