Exploring neural network approaches in automatic personality recognition of Filipino twitter users

The field of Automatic Personality Recognition (APR) is steady growing in its goal to determine and understand personality traits. Many studies that work on text data have looked at different sources of data, feature extraction techniques, and machine learning techniques. More recently, studies have...

Full description

Saved in:
Bibliographic Details
Main Authors: Tighe, Edward P., Aran, Oyan, Cheng, Charibeth K.
Format: text
Published: Animo Repository 2020
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/13395
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Description
Summary:The field of Automatic Personality Recognition (APR) is steady growing in its goal to determine and understand personality traits. Many studies that work on text data have looked at different sources of data, feature extraction techniques, and machine learning techniques. More recently, studies have gravitated towards utilizing neural network (NN) based approaches and compare against traditional learning techniques. This presents an opportunity to explore the usefulness of NNs in APR when dealing with Filipino Twitter users. Filipino Twitter users typically write in both English and Tagalog. As it mixes high- and low-resource languages, certain approaches centered on high-resource languages might not be able to fully capture personality information. In our work, we performed APR on a dataset of 250 Filipino Twitter users and focused on Openness and Conscientiousness only. We explore (1) different multilayer perceptron (MLP) configurations fed by term-frequency inverse-document-frequency values, and (2) the usage of trained and pre-trained word embeddings (English and Tagalog) as features to be fed into the best identified MLPs configurations. Findings show that none of the models for Openness performed well – with the best model having a 𝑅2 value of 0.0211. As for Conscientiousness, a TFIDF-fed five hidden layer (128 units each) MLP performed best having a RSME of 0.3344 and 𝑅2 of 0.2799 (an increase of 0.11 over the baseline). MLPs that were trained using word embeddings, regardless of being trained or pre-trained, did not perform very well, as simple MLPs using Binary of TFIDF features performed better.