Automatic video assistant based on speech recognition and natural language processing

In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistan...

Full description

Saved in:
Bibliographic Details
Main Author: Zhou, Kaiyu
Other Authors: Tan Yap Peng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176430
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries.