Boundary detection of instructional video by speech

In the realm of education and online learning, accessing relevant information efficiently from instructional videos can be challenging due to the lack of structured navigation aids. This research proposes a novel method to enhance learning experiences by automatically generating meaningful timestamp...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Brandon Jun Kai
Other Authors: Yeo Chai Kiat
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175004
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the realm of education and online learning, accessing relevant information efficiently from instructional videos can be challenging due to the lack of structured navigation aids. This research proposes a novel method to enhance learning experiences by automatically generating meaningful timestamps accompanied with succinct descriptions within the instructional videos. The approach involves converting audio to text using speech-to-text technology, followed by Natural Language Processing (NLP) techniques to identify key moments within the transcribed content. Various methodologies, including SpaCy and MPNet, were explored to analyze semantic nuances and transitions in the video content which yielded bad results. As a result, Large Language Models (LLMs) were utilized for their capability to discern sentence semantics and intent. The study utilized datasets from HowTo100M and YouTube, evaluating the accuracy of the proposed method through metrics such as precision, recall, and missing steps. Results demonstrate promising outcomes, with the model exhibiting competitive performance, particularly in precision and recall for certain instructional tasks. The final product includes a user-friendly interface for seamless interaction, enabling users to access timestamps and descriptions for educational content. Overall, this research contributes to advancing the accessibility and usability of instructional videos, enhancing learning experiences for users worldwide.