Automatic video assistant based on speech recognition and natural language processing

In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistan...

Full description

Saved in:
Bibliographic Details
Main Author: Zhou, Kaiyu
Other Authors: Tan Yap Peng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176430
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-176430
record_format dspace
spelling sg-ntu-dr.10356-1764302024-05-17T15:44:19Z Automatic video assistant based on speech recognition and natural language processing Zhou, Kaiyu Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Computer and Information Science In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries. Bachelor's degree 2024-05-16T13:12:34Z 2024-05-16T13:12:34Z 2024 Final Year Project (FYP) Zhou, K. (2024). Automatic video assistant based on speech recognition and natural language processing. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176430 https://hdl.handle.net/10356/176430 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Zhou, Kaiyu
Automatic video assistant based on speech recognition and natural language processing
description In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Zhou, Kaiyu
format Final Year Project
author Zhou, Kaiyu
author_sort Zhou, Kaiyu
title Automatic video assistant based on speech recognition and natural language processing
title_short Automatic video assistant based on speech recognition and natural language processing
title_full Automatic video assistant based on speech recognition and natural language processing
title_fullStr Automatic video assistant based on speech recognition and natural language processing
title_full_unstemmed Automatic video assistant based on speech recognition and natural language processing
title_sort automatic video assistant based on speech recognition and natural language processing
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/176430
_version_ 1814047190544285696