AimigoTutor - tutoring application using multi-modal capabilities
Video captioning has been an up-and-coming research topic. Thanks to the recent advances in the performance of deep neural networks, especially with transformers, video captioning is seeing a huge potential improvement in accuracy and versatility. Most state-of-the-art video captioning models employ...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175732 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175732 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1757322024-05-10T15:40:40Z AimigoTutor - tutoring application using multi-modal capabilities Nguyen, Viet Hoang Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science Multi-modal Video captioning has been an up-and-coming research topic. Thanks to the recent advances in the performance of deep neural networks, especially with transformers, video captioning is seeing a huge potential improvement in accuracy and versatility. Most state-of-the-art video captioning models employ a multi-modal approach, whereby both the visual information of the video frames and the audio information of the video are used to extract the semantic meaning of the video. This project will explore the capability of multi-modal video captioning in a much-needed context: building a video tutoring application for students, called AimigoTutor. This report will discuss the requirements, design, implementation and evaluation of the application. Bachelor's degree 2024-05-06T01:46:25Z 2024-05-06T01:46:25Z 2024 Final Year Project (FYP) Nguyen, V. H. (2024). AimigoTutor - tutoring application using multi-modal capabilities. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175732 https://hdl.handle.net/10356/175732 en SCSE23-0209 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Multi-modal |
spellingShingle |
Computer and Information Science Multi-modal Nguyen, Viet Hoang AimigoTutor - tutoring application using multi-modal capabilities |
description |
Video captioning has been an up-and-coming research topic. Thanks to the recent advances in the performance of deep neural networks, especially with transformers, video captioning is seeing a huge potential improvement in accuracy and versatility. Most state-of-the-art video captioning models employ a multi-modal approach, whereby both the visual information of the video frames and the audio information of the video are used to extract the semantic meaning of the video. This project will explore the capability of multi-modal video captioning in a much-needed context: building a video tutoring application for students, called AimigoTutor. This report will discuss the requirements, design, implementation and evaluation of the application. |
author2 |
Hanwang Zhang |
author_facet |
Hanwang Zhang Nguyen, Viet Hoang |
format |
Final Year Project |
author |
Nguyen, Viet Hoang |
author_sort |
Nguyen, Viet Hoang |
title |
AimigoTutor - tutoring application using multi-modal capabilities |
title_short |
AimigoTutor - tutoring application using multi-modal capabilities |
title_full |
AimigoTutor - tutoring application using multi-modal capabilities |
title_fullStr |
AimigoTutor - tutoring application using multi-modal capabilities |
title_full_unstemmed |
AimigoTutor - tutoring application using multi-modal capabilities |
title_sort |
aimigotutor - tutoring application using multi-modal capabilities |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175732 |
_version_ |
1800916227021864960 |