MULTIMODAL-BASED LIE DETECTION SYSTEM

Lying is something common in interpersonal communication. Based on past studies, a lie can be recognized by speech and visual cues shown unconsciously by a person. Due to the lack of studies that combine and learn the relation between speech and visual cues on lie detection, we conducted a study...

Full description

Saved in:
Bibliographic Details
Main Author: Rahman, Fatur
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72054
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Lying is something common in interpersonal communication. Based on past studies, a lie can be recognized by speech and visual cues shown unconsciously by a person. Due to the lack of studies that combine and learn the relation between speech and visual cues on lie detection, we conducted a study on a multimodal-based lie detection system using multiple features, such as acoustic, prosodic, lexical, and visual. This study was conducted by doing supervised machine learning using data gathered by Pérez-Rosas and her colleagues in 2015. The data consists of 121 short video clips, with each clip being about 4–81 seconds long. The data also consists of the transcription and visual annotation of each clip. Category combination experiments and modeling technique experiments were used to build multimodal classifiers. The modeling techniques used are Neural Network and Extreme Learning Machine, which were chosen because many studies have shown that neural network based models work well in lie detection cases. Model evaluation was carried out using cross-validation, which divided the data into five pairs with a ratio of training data to testing data of 5:1 for each pair. The accuracy obtained is 80.00% with an F-measure of 78.26% for the Neural Network model using acoustic, lexical, and visual features. The same accuracy is also obtained with an F-measure of 80.00% for the Extreme Learning Machine model using only prosodic features.