Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning

Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception d...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Guo, Xiaobao, Nithish Muthuchamy Selvaraj, Yu, Zitong, Kong, Adams Wai Kin, Shen, Bingquan, Kot, Alex
مؤلفون آخرون:	Interdisciplinary Graduate School (IGS)
التنسيق:	Conference or Workshop Item
اللغة:	English
منشور في:	2023
الموضوعات:	Engineering::Computer science and engineering Deception Dataset Audio-Visual
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/169721
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	sg-ntu-dr.10356-169721
record_format	dspace
spelling	sg-ntu-dr.10356-1697212023-08-06T15:36:21Z Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning Guo, Xiaobao Nithish Muthuchamy Selvaraj Yu, Zitong Kong, Adams Wai Kin Shen, Bingquan Kot, Alex Interdisciplinary Graduate School (IGS) School of Computer Science and Engineering School of Electrical and Electronic Engineering 2023 International Conference on Computer Vision (ICCV) DSO National Laboratories Rapid-Rich Object Search (ROSE) Lab Engineering::Computer science and engineering Deception Dataset Audio-Visual Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at~\href{https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main}{here}. Submitted/Accepted version This research is supported in part by the NTU-PKU Joint Research Institute (a collaboration be- tween the Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), and the DSO National Laboratories, Singapore, under the project agreement No. DSOCL21238. 2023-08-04T00:53:24Z 2023-08-04T00:53:24Z 2023 Conference Paper Guo, X., Nithish Muthuchamy Selvaraj, Yu, Z., Kong, A. W. K., Shen, B. & Kot, A. (2023). Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning. 2023 International Conference on Computer Vision (ICCV). https://hdl.handle.net/10356/169721 en DSOCL21238 © 2023 The Author(s). All rights reserved. This paper was published in the Proceedings of 2023 International Conference on Computer Vision (ICCV) and is made available with permission of The Author(s). application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Deception Dataset Audio-Visual
spellingShingle	Engineering::Computer science and engineering Deception Dataset Audio-Visual Guo, Xiaobao Nithish Muthuchamy Selvaraj Yu, Zitong Kong, Adams Wai Kin Shen, Bingquan Kot, Alex Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
description	Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at~\href{https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main}{here}.
author2	Interdisciplinary Graduate School (IGS)
author_facet	Interdisciplinary Graduate School (IGS) Guo, Xiaobao Nithish Muthuchamy Selvaraj Yu, Zitong Kong, Adams Wai Kin Shen, Bingquan Kot, Alex
format	Conference or Workshop Item
author	Guo, Xiaobao Nithish Muthuchamy Selvaraj Yu, Zitong Kong, Adams Wai Kin Shen, Bingquan Kot, Alex
author_sort	Guo, Xiaobao
title	Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
title_short	Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
title_full	Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
title_fullStr	Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
title_full_unstemmed	Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning
title_sort	audio-visual deception detection: dolos dataset and parameter-efficient crossmodal learning
publishDate	2023
url	https://hdl.handle.net/10356/169721
_version_	1779156244460732416

Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning

مواد مشابهة