Are vision language models multimodal learners?

Are vision language models multimodal learners?

Since the release of accessible vision language models (VLMs) such as GPT-4V and Gemini Pro in 2023, scholars have envisaged utilizing these artificial intelligence (AI) models to widely support instructors and learners. Particularly, their capability to simultaneously process visual and textual dat...

Full description

Saved in:

Bibliographic Details
Main Author:	Lee, Gyeonggeon
Other Authors:	School of Mechanical and Aerospace Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Artificial intelligence Education
Online Access:	https://hdl.handle.net/10356/181109 https://www.ntu.edu.sg/mae/ai-education-singapore-2024/activities/keynote-invited-talk#Content_C021_Col00
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Vision-language-model-based video quality assessment
by: Zhang, Erli
Published: (2024)

Vision language representation learning
by: Yang, Xiaofeng
Published: (2023)

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
by: Bai, Shuai, et al.
Published: (2023)

Language-guided object segmentation
by: John Benedict, Remelia Shirlley
Published: (2024)

ROME: Evaluating pre-trained vision-language models on reasoning beyond visual common sense
by: ZHOU, Kankan, et al.
Published: (2023)

Neural logic vision language explainer
by: Yang, Xiaofeng, et al.
Published: (2023)

A modeling dialog with AI – what do we learn about what we have to learn?
by: Joolingen, Wouter van
Published: (2024)

Grounding referring expression in computer vision
by: Yuen, Shaun Chien Wee
Published: (2024)

Skin beauty adviser assistant based on large language model and computer vision
by: Jiang, Yuwei
Published: (2025)

Vision transformer as image fusion model
by: Zhao, Fengye
Published: (2023)

AI in STEM education: navigating the path forward
by: Ooi, Kim Tiow
Published: (2024)

Beyond the hype: challenges and opportunities in AI for education
by: Sinha, Tanmay
Published: (2024)

Complementing human minds with digital brains: the role of GenAI in learning
by: Dede, Chris
Published: (2024)

AI-driven education: shaping the future of learning with personalization and engagement
by: Ong, Yew-Soon
Published: (2024)

Artificial intelligence for professional and pedagogical practices in higher education
by: Mangina, Eleni
Published: (2024)

Learning from, learning with, learning about, and learning beyond AI
by: Tan, Seng Chee
Published: (2024)

Transforming assessment with LLM and generative AI: impacts and challenges
by: Hao, Jiangang
Published: (2024)

Exploring the potential of generative AI in education: a critical examination of personalised learning with ChatGPT
by: Looi, Chee Kit
Published: (2024)

AI for education - the short term, medium term, and long term
by: Ho, Shen Yong
Published: (2024)

Generative AI for adaptive tutoring and college student success
by: Pardos, Zachary A.
Published: (2024)

Multimodal continuous emotion analysis
by: Zhang, Su
Published: (2023)

Multimodal deception detection in videos
by: Syazwan Bin Jainal
Published: (2023)

Prompting for multimodal hateful meme classification
by: CAO, Rui, et al.
Published: (2022)

An empirical study on adaptation methods for large-scale vision-language models
by: Wang, Annan
Published: (2023)

Synthesising missing modalities for multimodal MRI segmentation
by: Rajasekara Pandian Akshaya Muthu
Published: (2021)

Neural multimodal belief tracker with adaptive attention for dialogue systems
by: ZHANG, Zheng, et al.
Published: (2019)

AI chatbots for education
by: Oak, Soe Khant
Published: (2024)

Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites
by: WANG, Lei, et al.
Published: (2024)

Knowledge-aware multimodal fashion chatbot
by: LIAO, Lizi, et al.
Published: (2018)

Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
by: Zhang, Huaizheng
Published: (2022)

Reviewing multimodal deep learning techniques for user-generated content analysis
by: Sachin, Surawar Sanath
Published: (2023)

Language models are domain-specific chart analysts
by: Zhao, Yinjie
Published: (2023)

A Preliminary Learner Assessment Framework on E-Learning
by: Yuniarti, Wenty Dwi, et al.
Published: (2022)

Car cabin surveillance using computer vision
by: Soegeng, Andrew Ivan
Published: (2022)

Genixer : Empowering multimodal Large Language Models as a powerful data generator
by: ZHAO, Henry Hengyuan, et al.
Published: (2024)

Data efficient learning for 3D computer vision
by: Wei, Jiacheng
Published: (2023)

Deep neural network compression for pixel-level vision tasks
by: He, Wei
Published: (2021)

Vision-based 3D human and hand pose analysis
by: Cai, Yujun
Published: (2021)

Academic search and discovery tools in the age of AI and large language models: An overview of the space
by: TAY, Aaron
Published: (2024)

Robust and efficient deep learning methods for vision-based action recognition
by: Xu, Yuecong
Published: (2021)