Automatic video assistant based on speech recognition and natural language processing

In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistan...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhou, Kaiyu
Other Authors:	Tan Yap Peng
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/176430
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-176430
record_format	dspace
spelling	sg-ntu-dr.10356-1764302024-05-17T15:44:19Z Automatic video assistant based on speech recognition and natural language processing Zhou, Kaiyu Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Computer and Information Science In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries. Bachelor's degree 2024-05-16T13:12:34Z 2024-05-16T13:12:34Z 2024 Final Year Project (FYP) Zhou, K. (2024). Automatic video assistant based on speech recognition and natural language processing. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176430 https://hdl.handle.net/10356/176430 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Zhou, Kaiyu Automatic video assistant based on speech recognition and natural language processing
description	In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries.
author2	Tan Yap Peng
author_facet	Tan Yap Peng Zhou, Kaiyu
format	Final Year Project
author	Zhou, Kaiyu
author_sort	Zhou, Kaiyu
title	Automatic video assistant based on speech recognition and natural language processing
title_short	Automatic video assistant based on speech recognition and natural language processing
title_full	Automatic video assistant based on speech recognition and natural language processing
title_fullStr	Automatic video assistant based on speech recognition and natural language processing
title_full_unstemmed	Automatic video assistant based on speech recognition and natural language processing
title_sort	automatic video assistant based on speech recognition and natural language processing
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/176430
_version_	1814047190544285696

Automatic video assistant based on speech recognition and natural language processing

Similar Items