Aligning vision and language for image captioning using deep learning
A longstanding objective in the field of multi-modal research uniting computer vision and natural language processing is to develop models that can comprehend the intricate relationship between vision and language. In recent years, we have witnessed notable developments directed towards this objecti...
Saved in:
主要作者: | |
---|---|
其他作者: | |
格式: | Thesis-Doctor of Philosophy |
語言: | English |
出版: |
Nanyang Technological University
2024
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/181511 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Nanyang Technological University |
語言: | English |