Unifying text, tables, and images for multimodal question answering

Unifying text, tables, and images for multimodal question answering

Multimodal question answering (MMQA), which aims to derive the answer from multiple knowledge modalities (e.g., text, tables, and images), has received increasing attention due to its board applications. Current approaches to MMQA often rely on single-modal or bi-modal QA models, which limits their...

Full description

Saved in:

Bibliographic Details
Main Authors:	LUO, Haohao, SHEN, Ying, DENG, Yang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Cross-modal Input modalities Language model Linearisation Multi-modal Power Question Answering Single-modal Text format Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/9120 https://ink.library.smu.edu.sg/context/sis_research/article/10123/viewcontent/2023.findings_emnlp.626.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

汉语情态助动词的主观性和主观化 = THE SUBJECTIVITY AND SUBJECTIFICATION OF MODAL AUXILIARIES IN CHINESE
by: 杨黎黎, et al.
Published: (2015)

Cross-modal recipe retrieval with stacked attention model
by: CHEN, Jing-Jing, et al.
Published: (2018)

Alleviating the inconsistency of multimodal data in cross-modal retrieval
by: Li, Tieying, et al.
Published: (2024)

Cross-modal recipe retrieval: How to cook this dish?
by: CHEN, Jingjing, et al.
Published: (2017)

Epistemic modality in TED talks on education
by: Ton Nu, My Nhat, et al.
Published: (2019)

Modalities and Multimodalities
by: Carnielli, Walter, et al.
Published: (2017)

Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection
by: Gao, Wei, et al.
Published: (2021)

Learning a cross-modal hashing network for multimedia search
by: Tan, Yap Peng, et al.
Published: (2018)

Microfluidics-based microbubbles in methylene blue solution for photoacoustic and ultrasound imaging
by: Das, Dhiman, et al.
Published: (2018)

Traversability analysis for UGV (unmanned ground vehicle) navigation based on multimodal information
by: Guo, Jiajie
Published: (2024)

CROSS-MODALITY COMPLEMENTARITY FOR AUDIO-VISUAL SPEECH RECOGNITION
by: WANG JIADONG
Published: (2024)

Is a high tone pointy? Speakers of different languages match Mandarin Chinese tones to visual shapes differently
by: Shang, Nan, et al.
Published: (2018)

THE ASSOCIATION BETWEEN EMOTION AND VISION
by: FENG YENJU
Published: (2020)

AimigoTutor - tutoring application using multi-modal capabilities
by: Nguyen, Viet Hoang
Published: (2024)

A characterisation of open bisimilarity using an intuitionistic modal logic
by: Ahrn, Ki Yung, et al.
Published: (2018)

The verb in Philippine English: A preliminary analysis of modal would
by: Bautista, Ma. Lourdes S.
Published: (2004)

The living wall display: Physical augmentation of interactive content using an autonomous mobile display
by: ONISHI, Yuki, et al.
Published: (2018)

Influence of multi-modal warning interface on takeover efficiency of autonomous high-speed train
by: Jing, Chunhui, et al.
Published: (2023)

Fusing heterogeneous modalities for video and image re-ranking
by: TAN, Hung-Khoon, et al.
Published: (2011)

PRFusion: toward effective and robust multi-modal place recognition with image and point cloud fusion
by: Wang, Sijie, et al.
Published: (2025)

Contrastive video question answering via video graph transformer
by: XIAO, Junbin Xiao, et al.
Published: (2023)

MMGCN: Multimodal Graph Convolution Network for Personalized Recommendation of Micro-video
by: Yinwei Wei, et al.
Published: (2020)

Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
by: Jing-Jing Chen, et al.
Published: (2020)

Crew: Cross-modal resource searching by exploiting wikipedia
by: Liu, C., et al.
Published: (2013)

Comparing Musicians and Non-musicians’ Expectations in Music and Vision
by: Kathleen Rose Agres, et al.
Published: (2024)

Infrastructure-Assisted Smartphone-based ADL Recognition in Multi-Inhabitant Smart Environments
by: ROY, Nirmalya, et al.
Published: (2013)

APPLICATIONS OF MULTI-MODAL ATOMIC FORCE MICROSCOPE IN 2D TRANSITION METAL DICHALCOGENIDES
by: WANG XINYUN
Published: (2021)

A TRAINING FRAMEWORK AND ARCHITECTURAL DESIGN OF DISTRIBUTED DEEP LEARNING
by: WANG WEI
Published: (2017)

Combining Speech with textual methods for arabic diacritization
by: AISHA SIDDIQA AZIM
Published: (2012)

Multimodal fashion knowledge extraction as captioning
by: YUAN, Yifei, et al.
Published: (2023)

Declaration-based prompt tuning for visual question answering
by: LIU, Yuhang, et al.
Published: (2022)

Q-align: teaching LMMs for visual scoring via discrete text-defined levels
by: Wu, Haoning, et al.
Published: (2024)

When words and images play: A multimodal analysis of a community theatre performance
by: Balgos, Anne Richie G.
Published: (2018)

Snap-and-ask: Answering multimodal question by naming visual instance
by: ZHANG, Wei, et al.
Published: (2012)

Learning from the master: Distilling cross-modal advanced knowledge for lip reading
by: REN, Sucheng, et al.
Published: (2021)

Retrieving questions and answers in community-based question answering services
by: WANG KAI
Published: (2011)

Unsupervised modality adaptation with text-to-Image diffusion models for semantic segmentation
by: XIA, Ruihao, et al.
Published: (2024)

Impact of integrated multimodal traveler information on auto commuter’s mode switching propensity
by: Memon, A. A., et al.
Published: (2018)

Identifying modal properties of trees with Bayesian inference
by: Burcham, Daniel C., et al.
Published: (2022)

Looseness localization for bolted joints using Bayesian operational modal analysis and modal strain energy
by: Hu, Y.-J., et al.
Published: (2022)