Using pre-trained models for vision-language understanding tasks

Using pre-trained models for vision-language understanding tasks

In recent years, remarkable progress has been made in Artificial Intelligence (AI), with an increasing focus on integrating AI systems into people’s daily lives. In the context of our diverse world, research attention has shifted towards applying AI to multimodal understanding tasks. This thesis spe...

Full description

Saved in:

Bibliographic Details
Main Author:	CAO, Rui
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Vision-language understanding Visual question answering Hateful meme detection Pre-trained models Computer Sciences Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/etd_coll/595 https://ink.library.smu.edu.sg/context/etd_coll/article/1593/viewcontent/Rui_Thesis_PTMs_VLU.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Enhancing visual grounding in vision-language pre-training with position-guided text prompts
by: WANG, Alex Jinpeng, et al.
Published: (2024)

Injecting descriptive meta-information into pre-trained language models with hypernetworks
by: DUAN, Wenying, et al.
Published: (2021)

Multimedia question answering
by: NIE LIQIANG
Published: (2013)

On the transferability of pre-trained language models for low-resource programming languages
by: CHEN, Fuxiang, et al.
Published: (2022)

Disentangling hate in online memes
by: LEE, Ka Wei, Roy, et al.
Published: (2021)

On explaining multimodal hateful meme detection models
by: HEE, Ming Shan, et al.
Published: (2022)

Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023)

Segmentation of multi-sentence questions: Towards effective question retrieval in cQA services
by: Wang, K., et al.
Published: (2013)

VLStereoSet: A study of stereotypical bias in pre-trained vision-language models
by: ZHOU, Kankan, et al.
Published: (2022)

Interventional training for out-of-distribution natural language understanding
by: YU, Sicheng, et al.
Published: (2022)

Retrieving questions and answers in community-based question answering services
by: WANG KAI
Published: (2011)

Domain-specific cross-language relevant question retrieval
by: XU, Bowen, et al.
Published: (2016)

A syntactic tree matching approach to finding similar questions in community-based QA services
by: Wang, K., et al.
Published: (2013)

From text question-answering to multimedia QA on web-scale media resources
by: Chua, T.-S., et al.
Published: (2013)

On the usage of continual learning for out-of-distribution generalization in pre-trained language models of code
by: WEYSSOW, Martin, et al.
Published: (2023)

Hallucination detection: Robustly discerning reliable answers in Large Language Models
by: CHEN, Yuyuan, et al.
Published: (2023)

On true language understanding
by: HO, Seng-Beng, et al.
Published: (2019)

TOWARDS GENERATING DEEP QUESTIONS FROM TEXT
by: PAN, LIANGMING
Published: (2022)

Applying semantic analysis to finding similar questions in community question answering systems
by: NGUYEN LE NGUYEN
Published: (2010)

Aggregated community question answering
by: Snehasish Banerjee, et al.
Published: (2015)

Do-GOOD: Towards distribution shift evaluation for pre-trained visual document understanding models
by: HE, Jiabang, et al.
Published: (2023)

A BERT-based two-stage model for Chinese Chengyu recommendation
by: TAN, Minghuan, et al.
Published: (2021)

A Comparison of Quality, Speed, Scope and Usability between English and Chinese CQAs
by: Chua, Alton Yeow Kuan, et al.
Published: (2016)

Resource-efficient learning for vision-capable neural models
by: Tiong, Anthony Meng Huat
Published: (2024)

Answers or no answers : studying question answerability in stack overflow
by: Chua, Alton Yeow Kuan, et al.
Published: (2020)

Dependency relation matching for answer selection
by: Sun, R., et al.
Published: (2014)

SCENE UNDERSTANDING THROUGH MULTIMODAL REASONING FOR ROBOTIC SURGERY
by: SEENIVASAN LALITHKUMAR
Published: (2024)

แบบประโยคคำถามและคำตอบที่ใช้พูดในภาษาไทย
by: สายสวาท อินทิแสน
Published: (2014)

Language and robotics: Complex sentence understanding
by: HO, Seng-Beng, et al.
Published: (2019)

Cross-thought for sentence encoder pre-training
by: WANG, Shuohang, et al.
Published: (2020)

VadCLIP: Adapting vision-language models for weakly supervised video anomaly detection
by: WU, Peng, et al.
Published: (2024)

English versus Chinese: A Cross-Lingual Study of Community Question Answering Sites
by: Chua, Alton Yeow Kuan, et al.
Published: (2017)

Interesting nuggets and their impact on definitional question answering
by: Kor, K.-W., et al.
Published: (2013)

Exploring large scale data for multimedia QA: An initial study
by: Hong, R., et al.
Published: (2013)

combining multimodal external resources for event-based news video retrieval and question answering
by: NEO SHI YONG
Published: (2010)

Soft pattern matching models for definitional question answering
by: Cui, H., et al.
Published: (2013)

Video reference: A video question answering engine
by: Gao, L., et al.
Published: (2013)

FADA: Find All Distinct Answers
by: Yang, H., et al.
Published: (2013)

Effectiveness of Web page classification on finding list answers
by: Yang, H., et al.
Published: (2013)

Sentiment analysis for software engineering: How far can pre-trained transformer models go?
by: ZHANG, Ting, et al.
Published: (2020)