Employing explainability on facial landmarks for autism spectrum disorder diagnosis using deep CNN

This paper presents a pioneering investigation into the utilization of deep Convolutional Neural Networks (CNNs) for the diagnosis of Autism Spectrum Disorder (ASD), with a specific emphasis on the integration of explainability techniques. While existing research has primarily focused on 2D facial...

Full description

Saved in:
Bibliographic Details
Main Authors: Alam, Mohammad Shafiul, Rashid, Muhammad Mahbubur, Ali, Mohammad Yeakub, Yvette, Susiapan
Format: Proceeding Paper
Language:English
English
Published: AIP publishing 2024
Subjects:
Online Access:http://irep.iium.edu.my/115451/7/115451_%20Employing%20explainability.pdf
http://irep.iium.edu.my/115451/8/115451_%20Employing%20explainability_Scopus.pdf
http://irep.iium.edu.my/115451/
https://pubs.aip.org/aip/acp/article-abstract/3161/1/020124/3310613/Employing-explainability-on-facial-landmarks-for?redirectedFrom=fulltext
https://doi.org/10.1063/5.0229868
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Islam Antarabangsa Malaysia
Language: English
English
Description
Summary:This paper presents a pioneering investigation into the utilization of deep Convolutional Neural Networks (CNNs) for the diagnosis of Autism Spectrum Disorder (ASD), with a specific emphasis on the integration of explainability techniques. While existing research has primarily focused on 2D facial images for ASD diagnosis, this study expands its scope to encompass both 2D and 3D modalities. Notably, the ResNet50V2 model demonstrates a remarkable accuracy of 94.66 ± 1.24 for 2D facial image ASD diagnosis, while the Xception model achieves an accuracy of 85.33 ± 3.09 for 3D images. By incorporating interpretability techniques such as Grad-CAM, the study aims to illuminate the decision-making processes of CNNs, thus enhancing the transparency of diagnostic outcomes. Intriguing patterns in model behavior emerge across various modalities. Both the Xception and ResNet50V2 models exhibit distinct focal points when processing 2D and 3D images, revealing their specific sensitivities to distinct facial features. Nonetheless, challenges persist, as indicated by instances of mispredictions. These discrepancies may arise from the intricate interplay of facial expressions, lighting conditions, and head poses, exacerbated by the interpretability variability of Grad-CAM heatmaps. This study's insights hold potential for refining diagnostic methodologies. Advancements lie in adapting model architectures to account for the intricacies of 2D and 3D modalities, enriching training data to encompass diverse expressions and poses, and addressing the interpretability limitations of heatmaps.