Audio-visual adapter for multi-modal deception detection

Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexitie...

Full description

Saved in:

Bibliographic Details
Main Author:	Li, Zhaoxu
Other Authors:	Alex Chichung Kot
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/171383
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-171383
record_format	dspace
spelling	sg-ntu-dr.10356-1713832023-10-27T15:43:52Z Audio-visual adapter for multi-modal deception detection Li, Zhaoxu Alex Chichung Kot School of Electrical and Electronic Engineering Rapid-Rich Object Search (ROSE) Lab EACKOT@ntu.edu.sg Engineering::Electrical and electronic engineering Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexities of learning from multimodal data. Also, there is a lack of Asian deception data. These limitations pose obstacles to the advancement of research in deception detection, emphasizing the need for further exploration and development in this area. To address the scarcity of high-quality deception data with Asian subjects, a multi-modal dataset that is tailored to identify deception is collected in this project. This dataset encompasses four distinct conversational scenarios, each of which includes a substantial amount of deceptive content. In total, it includes Asian speakers and is diverse across multiple languages, genders, ethnicities, and ages. In recent times, there has been increasing interest in audio-visual deception detection, as it has shown superior performance compared to using a single modality alone. However, in real-world scenarios where multiple modalities are involved, issues related to data integrity may arise. For example, there might be instances where only partial modalities are available. This absence of certain modalities may decrease performance, even though the model can capture features from the missing modality. In order to address the challenge of missing modalities and further enhance performance, a framework called Audio-Visual Adapter (AVA) is proposed. This framework efficiently fuses temporal features across two modalities to overcome the missing modality problem. The AVA captures the same time slot vision feature and audio feature as a new temporal feature. If one modality is missing, the existing modality can also get the information from the missing modality. leveraging the capabilities of AVA, we aim to significantly improve performance in multi-modal deception detection. The experiments are conducted on two benchmark datasets, and the results demonstrate the proposed AVA outperforms other multi-modal fusion techniques, particularly in flexible-modal settings involving multiple and missing modalities. This approach achieves superior performance and showcases the potential of leveraging the AVA framework in audio-visual deception detection. Master of Science (Computer Control and Automation) 2023-10-25T01:42:36Z 2023-10-25T01:42:36Z 2023 Thesis-Master by Coursework Li, Z. (2023). Audio-visual adapter for multi-modal deception detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171383 https://hdl.handle.net/10356/171383 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Li, Zhaoxu Audio-visual adapter for multi-modal deception detection
description	Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexities of learning from multimodal data. Also, there is a lack of Asian deception data. These limitations pose obstacles to the advancement of research in deception detection, emphasizing the need for further exploration and development in this area. To address the scarcity of high-quality deception data with Asian subjects, a multi-modal dataset that is tailored to identify deception is collected in this project. This dataset encompasses four distinct conversational scenarios, each of which includes a substantial amount of deceptive content. In total, it includes Asian speakers and is diverse across multiple languages, genders, ethnicities, and ages. In recent times, there has been increasing interest in audio-visual deception detection, as it has shown superior performance compared to using a single modality alone. However, in real-world scenarios where multiple modalities are involved, issues related to data integrity may arise. For example, there might be instances where only partial modalities are available. This absence of certain modalities may decrease performance, even though the model can capture features from the missing modality. In order to address the challenge of missing modalities and further enhance performance, a framework called Audio-Visual Adapter (AVA) is proposed. This framework efficiently fuses temporal features across two modalities to overcome the missing modality problem. The AVA captures the same time slot vision feature and audio feature as a new temporal feature. If one modality is missing, the existing modality can also get the information from the missing modality. leveraging the capabilities of AVA, we aim to significantly improve performance in multi-modal deception detection. The experiments are conducted on two benchmark datasets, and the results demonstrate the proposed AVA outperforms other multi-modal fusion techniques, particularly in flexible-modal settings involving multiple and missing modalities. This approach achieves superior performance and showcases the potential of leveraging the AVA framework in audio-visual deception detection.
author2	Alex Chichung Kot
author_facet	Alex Chichung Kot Li, Zhaoxu
format	Thesis-Master by Coursework
author	Li, Zhaoxu
author_sort	Li, Zhaoxu
title	Audio-visual adapter for multi-modal deception detection
title_short	Audio-visual adapter for multi-modal deception detection
title_full	Audio-visual adapter for multi-modal deception detection
title_fullStr	Audio-visual adapter for multi-modal deception detection
title_full_unstemmed	Audio-visual adapter for multi-modal deception detection
title_sort	audio-visual adapter for multi-modal deception detection
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/171383
_version_	1781793686309306368

Audio-visual adapter for multi-modal deception detection

Similar Items