Exploring a multimodal fusion-based deep learning network for detecting facial palsy

Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with fa...

Full description

Saved in:
Bibliographic Details
Main Authors: OO, Heng Yim Nicole, LEE, Min Hun, LIM, J. H.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9958
https://ink.library.smu.edu.sg/context/sis_research/article/10958/viewcontent/2405.16496v1.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10958
record_format dspace
spelling sg-smu-ink.sis_research-109582025-01-16T10:11:31Z Exploring a multimodal fusion-based deep learning network for detecting facial palsy OO, Heng Yim Nicole LEE, Min Hun LIM, J. H. Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score. 2024-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9958 https://ink.library.smu.edu.sg/context/sis_research/article/10958/viewcontent/2405.16496v1.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning Computer Vision Multimodal Fusion Facial Analysis Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine Learning
Computer Vision
Multimodal Fusion
Facial Analysis
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
spellingShingle Machine Learning
Computer Vision
Multimodal Fusion
Facial Analysis
Artificial Intelligence and Robotics
Graphics and Human Computer Interfaces
OO, Heng Yim Nicole
LEE, Min Hun
LIM, J. H.
Exploring a multimodal fusion-based deep learning network for detecting facial palsy
description Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score.
format text
author OO, Heng Yim Nicole
LEE, Min Hun
LIM, J. H.
author_facet OO, Heng Yim Nicole
LEE, Min Hun
LIM, J. H.
author_sort OO, Heng Yim Nicole
title Exploring a multimodal fusion-based deep learning network for detecting facial palsy
title_short Exploring a multimodal fusion-based deep learning network for detecting facial palsy
title_full Exploring a multimodal fusion-based deep learning network for detecting facial palsy
title_fullStr Exploring a multimodal fusion-based deep learning network for detecting facial palsy
title_full_unstemmed Exploring a multimodal fusion-based deep learning network for detecting facial palsy
title_sort exploring a multimodal fusion-based deep learning network for detecting facial palsy
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9958
https://ink.library.smu.edu.sg/context/sis_research/article/10958/viewcontent/2405.16496v1.pdf
_version_ 1821833219426746368