Audio-visual adapter for multi-modal deception detection

Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexitie...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Zhaoxu
Other Authors: Alex Chichung Kot
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171383
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-171383
record_format dspace
spelling sg-ntu-dr.10356-1713832023-10-27T15:43:52Z Audio-visual adapter for multi-modal deception detection Li, Zhaoxu Alex Chichung Kot School of Electrical and Electronic Engineering Rapid-Rich Object Search (ROSE) Lab EACKOT@ntu.edu.sg Engineering::Electrical and electronic engineering Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexities of learning from multimodal data. Also, there is a lack of Asian deception data. These limitations pose obstacles to the advancement of research in deception detection, emphasizing the need for further exploration and development in this area. To address the scarcity of high-quality deception data with Asian subjects, a multi-modal dataset that is tailored to identify deception is collected in this project. This dataset encompasses four distinct conversational scenarios, each of which includes a substantial amount of deceptive content. In total, it includes Asian speakers and is diverse across multiple languages, genders, ethnicities, and ages. In recent times, there has been increasing interest in audio-visual deception detection, as it has shown superior performance compared to using a single modality alone. However, in real-world scenarios where multiple modalities are involved, issues related to data integrity may arise. For example, there might be instances where only partial modalities are available. This absence of certain modalities may decrease performance, even though the model can capture features from the missing modality. In order to address the challenge of missing modalities and further enhance performance, a framework called Audio-Visual Adapter (AVA) is proposed. This framework efficiently fuses temporal features across two modalities to overcome the missing modality problem. The AVA captures the same time slot vision feature and audio feature as a new temporal feature. If one modality is missing, the existing modality can also get the information from the missing modality. leveraging the capabilities of AVA, we aim to significantly improve performance in multi-modal deception detection. The experiments are conducted on two benchmark datasets, and the results demonstrate the proposed AVA outperforms other multi-modal fusion techniques, particularly in flexible-modal settings involving multiple and missing modalities. This approach achieves superior performance and showcases the potential of leveraging the AVA framework in audio-visual deception detection. Master of Science (Computer Control and Automation) 2023-10-25T01:42:36Z 2023-10-25T01:42:36Z 2023 Thesis-Master by Coursework Li, Z. (2023). Audio-visual adapter for multi-modal deception detection. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171383 https://hdl.handle.net/10356/171383 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Li, Zhaoxu
Audio-visual adapter for multi-modal deception detection
description Deception detection based on human behaviors holds significant importance in various fields, including customs security and multimedia anti-fraud. However, the progress of deception detection research is hindered by two main challenges: the scarcity of high-quality deception data and the complexities of learning from multimodal data. Also, there is a lack of Asian deception data. These limitations pose obstacles to the advancement of research in deception detection, emphasizing the need for further exploration and development in this area. To address the scarcity of high-quality deception data with Asian subjects, a multi-modal dataset that is tailored to identify deception is collected in this project. This dataset encompasses four distinct conversational scenarios, each of which includes a substantial amount of deceptive content. In total, it includes Asian speakers and is diverse across multiple languages, genders, ethnicities, and ages. In recent times, there has been increasing interest in audio-visual deception detection, as it has shown superior performance compared to using a single modality alone. However, in real-world scenarios where multiple modalities are involved, issues related to data integrity may arise. For example, there might be instances where only partial modalities are available. This absence of certain modalities may decrease performance, even though the model can capture features from the missing modality. In order to address the challenge of missing modalities and further enhance performance, a framework called Audio-Visual Adapter (AVA) is proposed. This framework efficiently fuses temporal features across two modalities to overcome the missing modality problem. The AVA captures the same time slot vision feature and audio feature as a new temporal feature. If one modality is missing, the existing modality can also get the information from the missing modality. leveraging the capabilities of AVA, we aim to significantly improve performance in multi-modal deception detection. The experiments are conducted on two benchmark datasets, and the results demonstrate the proposed AVA outperforms other multi-modal fusion techniques, particularly in flexible-modal settings involving multiple and missing modalities. This approach achieves superior performance and showcases the potential of leveraging the AVA framework in audio-visual deception detection.
author2 Alex Chichung Kot
author_facet Alex Chichung Kot
Li, Zhaoxu
format Thesis-Master by Coursework
author Li, Zhaoxu
author_sort Li, Zhaoxu
title Audio-visual adapter for multi-modal deception detection
title_short Audio-visual adapter for multi-modal deception detection
title_full Audio-visual adapter for multi-modal deception detection
title_fullStr Audio-visual adapter for multi-modal deception detection
title_full_unstemmed Audio-visual adapter for multi-modal deception detection
title_sort audio-visual adapter for multi-modal deception detection
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/171383
_version_ 1781793686309306368