Development of multilingual social media data corpus: Development and evaluation

The purpose of this study is manual annotating, a corpus for Bahasa Indonesia and Bahasa Melayu. Corpus for both languages has been made by many researchers before, but the focus of this research is only on words with the same vocabulary but which have very different meanings. The data were obtained...

Full description

Saved in:
Bibliographic Details
Main Authors: Rumaisa, Fitrah, Saaya, Zurina, Khamis, Noorli, Basiron, Halizah
Format: Article
Language:English
Published: Primrose Hall Publishing Group 2019
Online Access:http://eprints.utem.edu.my/id/eprint/24401/2/6501_RUMAISA_2019_E_R%20%281%29.PDF
http://eprints.utem.edu.my/id/eprint/24401/
https://www.ijicc.net/images/Vol6Iss5/6501_Rumaisa_2019_E_R.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknikal Malaysia Melaka
Language: English
id my.utem.eprints.24401
record_format eprints
spelling my.utem.eprints.244012023-08-08T12:06:45Z http://eprints.utem.edu.my/id/eprint/24401/ Development of multilingual social media data corpus: Development and evaluation Rumaisa, Fitrah Saaya, Zurina Khamis, Noorli Basiron, Halizah The purpose of this study is manual annotating, a corpus for Bahasa Indonesia and Bahasa Melayu. Corpus for both languages has been made by many researchers before, but the focus of this research is only on words with the same vocabulary but which have very different meanings. The data were obtained from social media, so informal words were found. As many as 2100 words for each language were identified which were then randomly selected so that 300 words with the same vocabulary but with different meanings were used. The objective of this study was to confirm that this condition can influence the results of polarity sentiment. At the end of this paper, we will show the results of the influence of the conditions of the two languages on the polarity of sentiments. From the manual annotation, an annotation agreement test was made by three Bahasa Indonesia annotators and three Bahasa Melayu annotators. The results of the annotation found that there were 63 out of 300 words that experience different polarity. Results of score agreement among annotations for each language show that there is good agreement among the annotators during annotation process Primrose Hall Publishing Group 2019 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/24401/2/6501_RUMAISA_2019_E_R%20%281%29.PDF Rumaisa, Fitrah and Saaya, Zurina and Khamis, Noorli and Basiron, Halizah (2019) Development of multilingual social media data corpus: Development and evaluation. International Journal Of Innovation, Creativity And Change, 6 (5). pp. 1-14. ISSN 2201-1323 https://www.ijicc.net/images/Vol6Iss5/6501_Rumaisa_2019_E_R.pdf
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
description The purpose of this study is manual annotating, a corpus for Bahasa Indonesia and Bahasa Melayu. Corpus for both languages has been made by many researchers before, but the focus of this research is only on words with the same vocabulary but which have very different meanings. The data were obtained from social media, so informal words were found. As many as 2100 words for each language were identified which were then randomly selected so that 300 words with the same vocabulary but with different meanings were used. The objective of this study was to confirm that this condition can influence the results of polarity sentiment. At the end of this paper, we will show the results of the influence of the conditions of the two languages on the polarity of sentiments. From the manual annotation, an annotation agreement test was made by three Bahasa Indonesia annotators and three Bahasa Melayu annotators. The results of the annotation found that there were 63 out of 300 words that experience different polarity. Results of score agreement among annotations for each language show that there is good agreement among the annotators during annotation process
format Article
author Rumaisa, Fitrah
Saaya, Zurina
Khamis, Noorli
Basiron, Halizah
spellingShingle Rumaisa, Fitrah
Saaya, Zurina
Khamis, Noorli
Basiron, Halizah
Development of multilingual social media data corpus: Development and evaluation
author_facet Rumaisa, Fitrah
Saaya, Zurina
Khamis, Noorli
Basiron, Halizah
author_sort Rumaisa, Fitrah
title Development of multilingual social media data corpus: Development and evaluation
title_short Development of multilingual social media data corpus: Development and evaluation
title_full Development of multilingual social media data corpus: Development and evaluation
title_fullStr Development of multilingual social media data corpus: Development and evaluation
title_full_unstemmed Development of multilingual social media data corpus: Development and evaluation
title_sort development of multilingual social media data corpus: development and evaluation
publisher Primrose Hall Publishing Group
publishDate 2019
url http://eprints.utem.edu.my/id/eprint/24401/2/6501_RUMAISA_2019_E_R%20%281%29.PDF
http://eprints.utem.edu.my/id/eprint/24401/
https://www.ijicc.net/images/Vol6Iss5/6501_Rumaisa_2019_E_R.pdf
_version_ 1775626884495704064