Are vision language models multimodal learners?
Since the release of accessible vision language models (VLMs) such as GPT-4V and Gemini Pro in 2023, scholars have envisaged utilizing these artificial intelligence (AI) models to widely support instructors and learners. Particularly, their capability to simultaneously process visual and textual dat...
Saved in:
Main Author: | Lee, Gyeonggeon |
---|---|
Other Authors: | School of Mechanical and Aerospace Engineering |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181109 https://www.ntu.edu.sg/mae/ai-education-singapore-2024/activities/keynote-invited-talk#Content_C021_Col00 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Vision-language-model-based video quality assessment
by: Zhang, Erli
Published: (2024) -
Vision language representation learning
by: Yang, Xiaofeng
Published: (2023) -
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
by: Bai, Shuai, et al.
Published: (2023) -
Language-guided object segmentation
by: John Benedict, Remelia Shirlley
Published: (2024) -
ROME: Evaluating pre-trained vision-language models on reasoning beyond visual common sense
by: ZHOU, Kankan, et al.
Published: (2023)