Automatic video assistant based on speech recognition and natural language processing
In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistan...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/176430 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-176430 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1764302024-05-17T15:44:19Z Automatic video assistant based on speech recognition and natural language processing Zhou, Kaiyu Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Computer and Information Science In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries. Bachelor's degree 2024-05-16T13:12:34Z 2024-05-16T13:12:34Z 2024 Final Year Project (FYP) Zhou, K. (2024). Automatic video assistant based on speech recognition and natural language processing. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176430 https://hdl.handle.net/10356/176430 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Zhou, Kaiyu Automatic video assistant based on speech recognition and natural language processing |
description |
In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries. |
author2 |
Tan Yap Peng |
author_facet |
Tan Yap Peng Zhou, Kaiyu |
format |
Final Year Project |
author |
Zhou, Kaiyu |
author_sort |
Zhou, Kaiyu |
title |
Automatic video assistant based on speech recognition and natural language processing |
title_short |
Automatic video assistant based on speech recognition and natural language processing |
title_full |
Automatic video assistant based on speech recognition and natural language processing |
title_fullStr |
Automatic video assistant based on speech recognition and natural language processing |
title_full_unstemmed |
Automatic video assistant based on speech recognition and natural language processing |
title_sort |
automatic video assistant based on speech recognition and natural language processing |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/176430 |
_version_ |
1814047190544285696 |