Multitasking deep neural network models for Arabic dialect sentiment analysis

Polarity classification or sentiment analysis is considered one of the opinion mining tasks which distinguishes between the polarities categories (two, three, and five) of opinions which focus on the degree of the sentiment (such as positive and negative for two polarities; and positive, neutral...

Full description

Saved in:
Bibliographic Details
Main Author: Alali, Muath Mohammad Oqlah
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/113149/1/113149%20UPM.pdf
http://psasir.upm.edu.my/id/eprint/113149/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.113149
record_format eprints
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
topic Arabic language - Dialects
Text processing (Computer science).
Deep learning (Machine learning).
spellingShingle Arabic language - Dialects
Text processing (Computer science).
Deep learning (Machine learning).
Alali, Muath Mohammad Oqlah
Multitasking deep neural network models for Arabic dialect sentiment analysis
description Polarity classification or sentiment analysis is considered one of the opinion mining tasks which distinguishes between the polarities categories (two, three, and five) of opinions which focus on the degree of the sentiment (such as positive and negative for two polarities; and positive, neutral and negative for three polarities) that the text may contain. Limited deep neural network approaches are applied to this task for Arabic dialects (AD). On the other hand, traditional machine learning algorithms (ML) that are based on manually extracted features are considered tedious and time dunting, as Arabic language contains multiple dialects and no word-based order. Therefore, the process of extracting features such as syntactic and lexical information is more challenging for AD. According to the literature review, the best registered performance and most used deep learning model for Arabic sentiment analysis was Convolutional Neural Network (CNN). The existing convolutional network models are based on wide convolutional with shallow structure that represents less uniform importance to the features, which is not capable of representing the entire sentiment information in text sequence and leads to poor sentiment information detection. Therefore, a Narrow Convolutional Neural Network (NCNN) is proposed to extract comprehensive sentiment information of text sequence by maximizing the feature detection range, which gives large uniform importance to the words and improves the final performance for Arabic dialect classification tasks (two and three polarities). NCNN achieves its optimum performance when structured by three convolutional layers. Sensitivity analysis is conducted to evaluate the impact of various combinations of NCNN structural hyperparameters, such as the size of pooling, filters, and the number of convolutional filters on the classification performances. The proposed NCNN achieved a higher macro average recall (R) and outperforms Naive Bayes (NB) on task A (three polarities) and Voting model on task B (two polarities) on the SemEval-2017 Arabic dialect Twitter dataset. In addition, the NCNN model outperforms CNN-ASWAR on Arabic Sentiment Tweets Dataset (ASTD) with higher F1-score. The negation words in the Arabic language plays a significant role in SA. Negation words may cause a sentence's context to be reversed. So far, there has been no effort to handle the negation context in Arabic using a deep neural network. The existing approaches are based on traditional machine learning algorithms, such as support vector machine (SVM). However, these approaches did not consider Arabic dialect negation words. In addition, these approaches are based on domain specific features and lexicons, which might not work with other domains. Ordinal (five polarities) classification problem has received attention in Arabic sentiment analysis. Most of the applied approaches are based on single task learning (STL) using machine learning algorithms, such as Logistic Regression (LR) and Hierarchical Classifier (HC) based on the divide-and-conquer approach. However, these approaches are based on simple sentence representation. Moreover, these models are based on single task learning (STL) and lack the ability to learn the relativity between different tasks (cross-task transfer) and modelling several polarities jointly, such as three and five polarities. Therefore, a model called Multi-Tasking Learning based on Convolutional Hierarchical Attention Neural Network (MTL-CHAN) is proposed, comprising of (i) shared word encoder and word attention networks across classification tasks, (ii) task-specific layers with convolutional neural network-based attention (CNNA) on sentence-level; to handle the Arabic explicit negation words and improve the classification performance by training Arabic classification tasks (binary, ternary, and five) jointly. The experimental results showed outstanding performance of the proposed MTL-CHAN model, with high accuracy of 89.85%, 84.69%, 85.90 on HARD, LABR, and BRAD datasets, respectively, and higher macro average recall (R) of 0.680% and 0.810% on Twitter Arabic dialects datasets task A and B respectively. Also, the proposed model achieved higher accuracy of 95.25%, 87.75%, 86.01%, 90.95% on Hotel, Product, Movie, and Restaurant datasets, respectively.
format Thesis
author Alali, Muath Mohammad Oqlah
author_facet Alali, Muath Mohammad Oqlah
author_sort Alali, Muath Mohammad Oqlah
title Multitasking deep neural network models for Arabic dialect sentiment analysis
title_short Multitasking deep neural network models for Arabic dialect sentiment analysis
title_full Multitasking deep neural network models for Arabic dialect sentiment analysis
title_fullStr Multitasking deep neural network models for Arabic dialect sentiment analysis
title_full_unstemmed Multitasking deep neural network models for Arabic dialect sentiment analysis
title_sort multitasking deep neural network models for arabic dialect sentiment analysis
publishDate 2022
url http://psasir.upm.edu.my/id/eprint/113149/1/113149%20UPM.pdf
http://psasir.upm.edu.my/id/eprint/113149/
_version_ 1814936539588722688
spelling my.upm.eprints.1131492024-10-28T02:53:32Z http://psasir.upm.edu.my/id/eprint/113149/ Multitasking deep neural network models for Arabic dialect sentiment analysis Alali, Muath Mohammad Oqlah Polarity classification or sentiment analysis is considered one of the opinion mining tasks which distinguishes between the polarities categories (two, three, and five) of opinions which focus on the degree of the sentiment (such as positive and negative for two polarities; and positive, neutral and negative for three polarities) that the text may contain. Limited deep neural network approaches are applied to this task for Arabic dialects (AD). On the other hand, traditional machine learning algorithms (ML) that are based on manually extracted features are considered tedious and time dunting, as Arabic language contains multiple dialects and no word-based order. Therefore, the process of extracting features such as syntactic and lexical information is more challenging for AD. According to the literature review, the best registered performance and most used deep learning model for Arabic sentiment analysis was Convolutional Neural Network (CNN). The existing convolutional network models are based on wide convolutional with shallow structure that represents less uniform importance to the features, which is not capable of representing the entire sentiment information in text sequence and leads to poor sentiment information detection. Therefore, a Narrow Convolutional Neural Network (NCNN) is proposed to extract comprehensive sentiment information of text sequence by maximizing the feature detection range, which gives large uniform importance to the words and improves the final performance for Arabic dialect classification tasks (two and three polarities). NCNN achieves its optimum performance when structured by three convolutional layers. Sensitivity analysis is conducted to evaluate the impact of various combinations of NCNN structural hyperparameters, such as the size of pooling, filters, and the number of convolutional filters on the classification performances. The proposed NCNN achieved a higher macro average recall (R) and outperforms Naive Bayes (NB) on task A (three polarities) and Voting model on task B (two polarities) on the SemEval-2017 Arabic dialect Twitter dataset. In addition, the NCNN model outperforms CNN-ASWAR on Arabic Sentiment Tweets Dataset (ASTD) with higher F1-score. The negation words in the Arabic language plays a significant role in SA. Negation words may cause a sentence's context to be reversed. So far, there has been no effort to handle the negation context in Arabic using a deep neural network. The existing approaches are based on traditional machine learning algorithms, such as support vector machine (SVM). However, these approaches did not consider Arabic dialect negation words. In addition, these approaches are based on domain specific features and lexicons, which might not work with other domains. Ordinal (five polarities) classification problem has received attention in Arabic sentiment analysis. Most of the applied approaches are based on single task learning (STL) using machine learning algorithms, such as Logistic Regression (LR) and Hierarchical Classifier (HC) based on the divide-and-conquer approach. However, these approaches are based on simple sentence representation. Moreover, these models are based on single task learning (STL) and lack the ability to learn the relativity between different tasks (cross-task transfer) and modelling several polarities jointly, such as three and five polarities. Therefore, a model called Multi-Tasking Learning based on Convolutional Hierarchical Attention Neural Network (MTL-CHAN) is proposed, comprising of (i) shared word encoder and word attention networks across classification tasks, (ii) task-specific layers with convolutional neural network-based attention (CNNA) on sentence-level; to handle the Arabic explicit negation words and improve the classification performance by training Arabic classification tasks (binary, ternary, and five) jointly. The experimental results showed outstanding performance of the proposed MTL-CHAN model, with high accuracy of 89.85%, 84.69%, 85.90 on HARD, LABR, and BRAD datasets, respectively, and higher macro average recall (R) of 0.680% and 0.810% on Twitter Arabic dialects datasets task A and B respectively. Also, the proposed model achieved higher accuracy of 95.25%, 87.75%, 86.01%, 90.95% on Hotel, Product, Movie, and Restaurant datasets, respectively. 2022-08 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/113149/1/113149%20UPM.pdf Alali, Muath Mohammad Oqlah (2022) Multitasking deep neural network models for Arabic dialect sentiment analysis. Doctoral thesis, Universiti Putra Malaysia. Arabic language - Dialects Text processing (Computer science). Deep learning (Machine learning).