Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad

There are a lot of sentiment resources for English, however, there are limited resources in a resource-poor language like the Malay language. One approach to improving sentiment analysis is to translate the focus-language text to a resource-rich language such as English by using Machine Translation...

Full description

Saved in:
Bibliographic Details
Main Authors: Enjop, Vanessa, Adnan, Rosanita, Jamil, Nursuriati, Zainol, Zarina, Ahmad, Siti Arpah
Format: Article
Language:English
Published: Universiti Teknologi MARA 2022
Online Access:https://ir.uitm.edu.my/id/eprint/69251/1/69251.pdf
https://ir.uitm.edu.my/id/eprint/69251/
https://mjoc.uitm.edu.my
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Mara
Language: English
id my.uitm.ir.69251
record_format eprints
spelling my.uitm.ir.692512022-10-27T04:51:49Z https://ir.uitm.edu.my/id/eprint/69251/ Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad Enjop, Vanessa Adnan, Rosanita Jamil, Nursuriati Zainol, Zarina Ahmad, Siti Arpah There are a lot of sentiment resources for English, however, there are limited resources in a resource-poor language like the Malay language. One approach to improving sentiment analysis is to translate the focus-language text to a resource-rich language such as English by using Machine Translation (MT). However, when text is translated from one language into another, sentiment is preserved to varying degrees. The objective of this paper is to assess the performance of MT in Google Translate towards sentiment analysis of Malay social media text on Facebook pages of a caregiver of a person with autism. A total of 3,525 Facebook comments in the Malay language were gathered from May to October 2020. The comments were manually translated to English to create dataset_manual. Google Translate was used to automatically translate the Malay comments into English creating dataset_auto. The sentiment polarity of each comment was labeled as a ground truth dataset. A lexicon-based approach was used to extract sentiment from both dataset_manual and dataset_auto to determine the sentiment polarity. Results show that 65.9% of sentiment analysis using dataset_auto significantly reduces sentiment analysis. The sentiment expressions are often mistranslated into neutral expressions when translated. Meanwhile, sentiment analysis using dataset_manual was still able to capture the sentiment of Facebook comment without taking the comment out of context where 92.5% shows positive sentiment towards comments related to autism spectrum disorder. Universiti Teknologi MARA 2022-10 Article PeerReviewed text en https://ir.uitm.edu.my/id/eprint/69251/1/69251.pdf Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad. (2022) Malaysian Journal of Computing (MJoC), 7 (2): 13. pp. 1236-1249. ISSN 2600-8238 https://mjoc.uitm.edu.my
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
description There are a lot of sentiment resources for English, however, there are limited resources in a resource-poor language like the Malay language. One approach to improving sentiment analysis is to translate the focus-language text to a resource-rich language such as English by using Machine Translation (MT). However, when text is translated from one language into another, sentiment is preserved to varying degrees. The objective of this paper is to assess the performance of MT in Google Translate towards sentiment analysis of Malay social media text on Facebook pages of a caregiver of a person with autism. A total of 3,525 Facebook comments in the Malay language were gathered from May to October 2020. The comments were manually translated to English to create dataset_manual. Google Translate was used to automatically translate the Malay comments into English creating dataset_auto. The sentiment polarity of each comment was labeled as a ground truth dataset. A lexicon-based approach was used to extract sentiment from both dataset_manual and dataset_auto to determine the sentiment polarity. Results show that 65.9% of sentiment analysis using dataset_auto significantly reduces sentiment analysis. The sentiment expressions are often mistranslated into neutral expressions when translated. Meanwhile, sentiment analysis using dataset_manual was still able to capture the sentiment of Facebook comment without taking the comment out of context where 92.5% shows positive sentiment towards comments related to autism spectrum disorder.
format Article
author Enjop, Vanessa
Adnan, Rosanita
Jamil, Nursuriati
Zainol, Zarina
Ahmad, Siti Arpah
spellingShingle Enjop, Vanessa
Adnan, Rosanita
Jamil, Nursuriati
Zainol, Zarina
Ahmad, Siti Arpah
Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
author_facet Enjop, Vanessa
Adnan, Rosanita
Jamil, Nursuriati
Zainol, Zarina
Ahmad, Siti Arpah
author_sort Enjop, Vanessa
title Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
title_short Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
title_full Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
title_fullStr Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
title_full_unstemmed Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad
title_sort does google translate affect lexicon-based sentiment analysis of malay social media text? / vanessa enjop, rosanita adnan, nursuriati jamil, sanizah ahmad, zarina zainol and siti arpah ahmad
publisher Universiti Teknologi MARA
publishDate 2022
url https://ir.uitm.edu.my/id/eprint/69251/1/69251.pdf
https://ir.uitm.edu.my/id/eprint/69251/
https://mjoc.uitm.edu.my
_version_ 1748183952954228736