Automatically generating a sentiment lexicon for the Malay language
This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2016
|
Online Access: | http://journalarticle.ukm.my/10056/1/11736-37831-1-PB.pdf http://journalarticle.ukm.my/10056/ http://ejournals.ukm.my/apjitm/issue/view/709 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Kebangsaan Malaysia |
Language: | English |
Summary: | This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay
language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is
a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment
lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research
focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource
languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up
the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first
mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive
and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy
and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms
via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with
textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer
lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a
foundation for further research for the Malay language in this area. |
---|