Modeling personality traits of Filipino twitter users based on linguistic markers

There have been multiple studies that correlate a persons writing style and personality traits. With the power of machine learning, this eventually led to the rise of computational text-based personality trait recognition. The eld is constantly growing as it started from analyzing personal essays an...

Full description

Saved in:
Bibliographic Details
Main Author: Tighe, Edward P.
Format: text
Language:English
Published: Animo Repository 2017
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/5330
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:There have been multiple studies that correlate a persons writing style and personality traits. With the power of machine learning, this eventually led to the rise of computational text-based personality trait recognition. The eld is constantly growing as it started from analyzing personal essays and is currently exploring the enormous amount of data available from social networking sites such as Facebook or Twitter. Current studies have shifted from analyzing English to analyzing non-English languages; however, the eld still lacks in three areas: (1) analysis of the Filipino Language, (2) analysis of Filipinos, or a group of individuals, word choice, and (3) analysis of the output of feature reduction techniques. This research has addressed each of these concerns by collecting and processing the Tweets of 288 Filipino Twitter users. A language independent approach was implemented to handle the multiple languages that could be spoken by individuals. Computational model were then created for each of the personality traits of the Five Factor Model. Findings show that Conscientiousness is the easiest trait to model (F1 = 0.8251; = 0.6499), while the model for Openness is the hardest (F1 = 0.6194; = 0.2414). Analysis also showed that 1-grams are sucient to model traits for all of the Big Five, except for Extraversion that utilized 1, 2, and 3-grams. This research also analyzed feature-reduced datasets used by each traits top performing models to identify the composition of the set of features. Findings show that there are 11 LIWC2015 categories that are common amongst all of the Big Five such as Active Processes, Positive Emotion, and Informal Language.