Aligning vision and language for image captioning using deep learning
A longstanding objective in the field of multi-modal research uniting computer vision and natural language processing is to develop models that can comprehend the intricate relationship between vision and language. In recent years, we have witnessed notable developments directed towards this objecti...
Saved in:
Main Author: | Cai, Chen |
---|---|
Other Authors: | Yap Kim Hui |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181511 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Bridging images and natural language with deep learning
by: Gu, Jiuxiang
Published: (2019) -
Deep learning for x-ray vision
by: Ng, Kenneth Chen Ee
Published: (2021) -
Diffusion models for natural language processing
by: Hoang, Minh Nhat
Published: (2024) -
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites
by: WANG, Lei, et al.
Published: (2024) -
Generative image captioning in Urdu using deep learning
by: Afzal M.K.
Published: (2023)