Employing explainability on facial landmarks for autism spectrum disorder diagnosis using deep CNN
This paper presents a pioneering investigation into the utilization of deep Convolutional Neural Networks (CNNs) for the diagnosis of Autism Spectrum Disorder (ASD), with a specific emphasis on the integration of explainability techniques. While existing research has primarily focused on 2D facial...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Proceeding Paper |
Language: | English English |
Published: |
AIP publishing
2024
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/115451/7/115451_%20Employing%20explainability.pdf http://irep.iium.edu.my/115451/8/115451_%20Employing%20explainability_Scopus.pdf http://irep.iium.edu.my/115451/ https://pubs.aip.org/aip/acp/article-abstract/3161/1/020124/3310613/Employing-explainability-on-facial-landmarks-for?redirectedFrom=fulltext https://doi.org/10.1063/5.0229868 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Islam Antarabangsa Malaysia |
Language: | English English |
Summary: | This paper presents a pioneering investigation into the utilization of deep Convolutional Neural Networks
(CNNs) for the diagnosis of Autism Spectrum Disorder (ASD), with a specific emphasis on the integration of explainability
techniques. While existing research has primarily focused on 2D facial images for ASD diagnosis, this study expands its
scope to encompass both 2D and 3D modalities. Notably, the ResNet50V2 model demonstrates a remarkable accuracy of
94.66 ± 1.24 for 2D facial image ASD diagnosis, while the Xception model achieves an accuracy of 85.33 ± 3.09 for 3D
images. By incorporating interpretability techniques such as Grad-CAM, the study aims to illuminate the decision-making
processes of CNNs, thus enhancing the transparency of diagnostic outcomes. Intriguing patterns in model behavior emerge
across various modalities. Both the Xception and ResNet50V2 models exhibit distinct focal points when processing 2D
and 3D images, revealing their specific sensitivities to distinct facial features. Nonetheless, challenges persist, as indicated
by instances of mispredictions. These discrepancies may arise from the intricate interplay of facial expressions, lighting
conditions, and head poses, exacerbated by the interpretability variability of Grad-CAM heatmaps. This study's insights hold potential for refining diagnostic methodologies. Advancements lie in adapting model architectures to account for the intricacies of 2D and 3D modalities, enriching training data to encompass diverse expressions and poses, and addressing the interpretability limitations of heatmaps. |
---|