Neural image and video captioning

In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machin...

Full description

Saved in:
Bibliographic Details
Main Author: Lam, Ting En
Other Authors: Hanwang Zhang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175286
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175286
record_format dspace
spelling sg-ntu-dr.10356-1752862024-04-26T15:43:34Z Neural image and video captioning Lam, Ting En Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content. In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage. Overall, this study seeks to improve video captioning performance and foster further advancements in this field. Bachelor's degree 2024-04-22T08:35:17Z 2024-04-22T08:35:17Z 2024 Final Year Project (FYP) Lam, T. E. (2024). Neural image and video captioning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175286 https://hdl.handle.net/10356/175286 en SCSE23-0211 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Lam, Ting En
Neural image and video captioning
description In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content. In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage. Overall, this study seeks to improve video captioning performance and foster further advancements in this field.
author2 Hanwang Zhang
author_facet Hanwang Zhang
Lam, Ting En
format Final Year Project
author Lam, Ting En
author_sort Lam, Ting En
title Neural image and video captioning
title_short Neural image and video captioning
title_full Neural image and video captioning
title_fullStr Neural image and video captioning
title_full_unstemmed Neural image and video captioning
title_sort neural image and video captioning
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175286
_version_ 1814047055103918080