Automatic video assistant based on speech recognition and natural language processing
In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistan...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/176430 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In modern life, people need to interact with various types of videos. In such a scenario, a tool capable of summarizing videos and answering related questions would significantly improve the efficiency of individuals across different industries. This project aims to build an automatic video assistant to generate video summaries and answer questions related to the video content. Initially, video audio is transcribed into text using a speech recognition model. Subsequently, a large language model, integrated with LangChain, is utilized for subsequent summarization and dialogue with text retrieval. This study evaluates the performance of state-of-the-art speech recognition models and quantified, open-source large language models. The selected models are deployed using Streamlit. The final application enables the summarization of local and YouTube videos and serves as a chatbot for video-related inquiries. |
---|