Neural image and video captioning
In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machin...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175286 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175286 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1752862024-04-26T15:43:34Z Neural image and video captioning Lam, Ting En Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content. In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage. Overall, this study seeks to improve video captioning performance and foster further advancements in this field. Bachelor's degree 2024-04-22T08:35:17Z 2024-04-22T08:35:17Z 2024 Final Year Project (FYP) Lam, T. E. (2024). Neural image and video captioning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175286 https://hdl.handle.net/10356/175286 en SCSE23-0211 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Lam, Ting En Neural image and video captioning |
description |
In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content.
In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage.
Overall, this study seeks to improve video captioning performance and foster further advancements in this field. |
author2 |
Hanwang Zhang |
author_facet |
Hanwang Zhang Lam, Ting En |
format |
Final Year Project |
author |
Lam, Ting En |
author_sort |
Lam, Ting En |
title |
Neural image and video captioning |
title_short |
Neural image and video captioning |
title_full |
Neural image and video captioning |
title_fullStr |
Neural image and video captioning |
title_full_unstemmed |
Neural image and video captioning |
title_sort |
neural image and video captioning |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175286 |
_version_ |
1814047055103918080 |