Analysis of the performance of the Psycholinguistic-based approach to automatically detect English satire in social media platforms

Satire is a mode of communication that integrates irony, humor and exaggeration to criticize and ridicule people's stupidity, contemporary politics, or other topical issues; therefore, these satirical sentiments often possess meanings that are in opposition to their literal interpretations. It...

Full description

Saved in:
Bibliographic Details
Main Author: Nguyen, Ha Quan
Other Authors: Tan Chee Wah Wesley
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74973
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Satire is a mode of communication that integrates irony, humor and exaggeration to criticize and ridicule people's stupidity, contemporary politics, or other topical issues; therefore, these satirical sentiments often possess meanings that are in opposition to their literal interpretations. It is even very difficult for us to detect satire in a certain sentiment, not to mention computers, if we do not have knowledge about the topic discussed satirically. In fact, nowadays, satire detection becomes a very challenging issue in Natural Language Processing (NLP) area. Hence, the objective of this project is to build and evaluate the performance of the new satire detection model based on a psycholinguistic approach for English social media content in Twitter. The most distinguished part of this research project in comparison with previous research works on satire detection is in the feature extraction. In the Feature Extraction phase, each tweet in the pre-processed labeled data will be analyzed using the Linguistic Inquiry & Word Count (LIWC) software to generate the feature vector with its dimension depending on the categories of LIWC. LIWC has more than 93 categories which are classified into 5 main groups: (1) Linguistic Processes, (2) Psychological Processes, (3) Personal Concerns, (4) Spoken Categories and (5) Punctuation Marks. The predictive model will be generated and validated by the 10-fold crossvalidation method, and the performance of this model will be measured using the popular metrics: Precision, Recall, F-measure, and Accuracy. Experiments in this project are conducted as follows. The first experiment will focus on determining the importance of the Punctuation Marks in building the satire detection model by training the model using the corpus which has excluded all punctuation marks in the text preprocessing phase. Secondly, the experiment is conducted to identify the combination of categories among the groups which generates the best result measured by the metrics. Finally, the most contributing features and the classifier which is most suitable for English Satire Detection are also determined.