Aligning vision and language for image captioning using deep learning
A longstanding objective in the field of multi-modal research uniting computer vision and natural language processing is to develop models that can comprehend the intricate relationship between vision and language. In recent years, we have witnessed notable developments directed towards this objecti...
Saved in:
主要作者: | Cai, Chen |
---|---|
其他作者: | Yap Kim Hui |
格式: | Thesis-Doctor of Philosophy |
語言: | English |
出版: |
Nanyang Technological University
2024
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/181511 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Nanyang Technological University |
語言: | English |
相似書籍
-
Bridging images and natural language with deep learning
由: Gu, Jiuxiang
出版: (2019) -
Deep learning for x-ray vision
由: Ng, Kenneth Chen Ee
出版: (2021) -
Diffusion models for natural language processing
由: Hoang, Minh Nhat
出版: (2024) -
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites
由: WANG, Lei, et al.
出版: (2024) -
Generative image captioning in Urdu using deep learning
由: Afzal M.K.
出版: (2023)