Aligning vision and language for image captioning using deep learning

Aligning vision and language for image captioning using deep learning

A longstanding objective in the field of multi-modal research uniting computer vision and natural language processing is to develop models that can comprehend the intricate relationship between vision and language. In recent years, we have witnessed notable developments directed towards this objecti...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Cai, Chen
مؤلفون آخرون:	Yap Kim Hui
التنسيق:	Thesis-Doctor of Philosophy
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science Computer vision Natural language processing
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/181511
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

مواد مشابهة

Bridging images and natural language with deep learning
بواسطة: Gu, Jiuxiang
منشور في: (2019)

Deep learning for x-ray vision
بواسطة: Ng, Kenneth Chen Ee
منشور في: (2021)

Diffusion models for natural language processing
بواسطة: Hoang, Minh Nhat
منشور في: (2024)

Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites
بواسطة: WANG, Lei, وآخرون
منشور في: (2024)

Generative image captioning in Urdu using deep learning
بواسطة: Afzal M.K.
منشور في: (2023)

Image artefact removal using deep learning
بواسطة: Sanchari, Das
منشور في: (2022)

Deep learning-based image captioning
بواسطة: Chong, Kaydon
منشور في: (2019)

Highly controllable human motion generation model
بواسطة: Huang, Jingfang
منشور في: (2024)

Empowering natural language processing in low-resource regimes
بواسطة: Feng, Zijian
منشور في: (2025)

Deep learning for medical image analysis
بواسطة: Yang, Ivan Sze Yuan
منشور في: (2020)

Semantic, syntactic and joint deep learning of event extraction
بواسطة: Hao, Anran
منشور في: (2025)

Natural language generator for SUMO
بواسطة: Ureta, Danielle Erika Y.
منشور في: (2012)

Image and video generation via deep learning
بواسطة: Jiang, Liming
منشور في: (2023)

Cross-modal graph with meta concepts for video captioning
بواسطة: Wang, Hao, وآخرون
منشور في: (2022)

Image preprocessing using quick color averaging approach for color machine vision (CMV) systems
بواسطة: Luta, Raphael Benedict G., وآخرون
منشور في: (2017)

Punctuation restoration for speech transcripts using large language models
بواسطة: Liu, Changsong
منشور في: (2024)

System reliability enhancement via deep-driven computer vision
بواسطة: Ding, Shuya
منشور في: (2021)

Deep disentangling learning for real-world image enlightening and restoration
بواسطة: Chan, Yi Xuan
منشور في: (2022)

Enhancing contextual understanding in NLP: adapting state-of-the-art models for improved sentiment analysis of informal language
بواسطة: Sneha Ravisankar
منشور في: (2024)

Benchmarking embedded deep learning hardware for computer vision
بواسطة: Ching, Amos Li En
منشور في: (2020)

Neural image and video captioning
بواسطة: Lam, Ting En
منشور في: (2024)

Image quality assessment based label smoothing in deep neural network learning
بواسطة: Chen, Zhou
منشور في: (2018)

Deep learning for human motion generation
بواسطة: Gu, Chenyang
منشور في: (2024)

Tracking human mobility using Twitter through natural language processing techniques
بواسطة: Ver, Andrea Nicole O.
منشور في: (2018)

Image processing algorithms for dynamic vision sensors
بواسطة: Wang, Lun
منشور في: (2023)

Federated learning for natural language processing in medical domain
بواسطة: Saraf, Ishita
منشور في: (2024)

INDONESIAN IMAGE CAPTIONING USING VISION-LANGUAGE MODEL
بواسطة: Astrada Fathurrahman, Raihan

Use of word and character N-grams for low-resourced local languages
بواسطة: Regalado, Ralph Vincent, وآخرون
منشور في: (2019)

Crowd monitoring using deep learning
بواسطة: Tan, Raymond Rui Ming
منشور في: (2021)

Image recognition based on deep learning of convolutional neural networks
بواسطة: Xie, Cong
منشور في: (2019)

Deep image enhancement
بواسطة: Han, Jun
منشور في: (2021)

Benchmarking neuromorphic vision: Lessons learnt from computer vision
بواسطة: Tan, C, وآخرون
منشور في: (2020)

Facial expression recognition using deep learning
بواسطة: Wang, Xiao Yi
منشور في: (2024)

Sentiment analysis of the burmese language using the distributed representation of n-gram-based words
بواسطة: Myat lay phyu
منشور في: (2023)

Fine-grained image classification using deep learning
بواسطة: Sun, Deguang
منشور في: (2022)

Communicating effectively with the hearing impaired
بواسطة: Cheng, Eddy Kuan Quan
منشور في: (2024)

Automated image quality assessment and its applications in computer vision
بواسطة: Zhou, Phoebe Huixin
منشور في: (2022)

Identification of foreign materials in food using passive terahertz imaging and deep learning
بواسطة: Ong, Eng Zia
منشور في: (2022)

Emergent semantic segmentation: training-free dense-label-free extraction from vision-language models
بواسطة: Luo, Jiayun
منشور في: (2024)

Classification of white blood cells using deep learning
بواسطة: Zhang, Mengxin
منشور في: (2022)