EARLY RISKS DEPRESSION PREDICTION OF INDONESIAN TWITTER USERS THROUGH INDONESIAN TEXT USING TRANSFER LEARNING AND LINGUISTIC METADATA FEATURES APPROACHES

Depressive disorders are the first rank of mental disorders that cause DALYs (Disability Adjusted Life Years) with a death contributor percentage of around 14.4%. However, 78% of people in low and middle-income countries have not been able to get proper treatment. Currently, depression has been dete...

Full description

Saved in:
Bibliographic Details
Main Author: Puteri Aulia, Widya
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68635
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Depressive disorders are the first rank of mental disorders that cause DALYs (Disability Adjusted Life Years) with a death contributor percentage of around 14.4%. However, 78% of people in low and middle-income countries have not been able to get proper treatment. Currently, depression has been detected based on self-reported by sufferers, behavior reported by sufferers' friends, and mental health examinations. Thus, many people with depression are not detected as depressed due to the lack of an active process of reporting depressive disorders both to themselves and to the surrounding environment. In everyday life, social media facilitates the pre-diagnosis of clinical mental health conditions related to anxiety and depression through user-written posts. Detection of people with depression by utilizing social media data has been done before with a lexical-based approach, namely by weighting the list of symptoms of depression along with their synonyms that have been validated by psychologists. However, the proposed method is still not able to show optimal performance where it is only able to achieve an accuracy of 0.5. This is because the use of lexical-based methods is not able to understand the context of the sentence and is prone to OOV (Out-of-Vocabulary) when there is a case of depression with words that are not listed in the dictionary. Thus, an alternative that can be applied to overcome this problem is to apply a deep learning model. Nowadays, machine learning algorithms are assessed as successful in finishing natural language processing tasks with large amounts of labeled data. In contrast, the depression (Indonesian text) classification data set is still lacking, compared to English. For example, depression data from CLEF eRisks Laboratory, which has 531K texts from 892 users extracted from Reddit, and Indonesian depression data from recent research has only 6,055 texts from 55 users extracted from Twitter. To overcome this issue, English data is used to build a cross-lingual model, namely XLM-Roberta and Multilingual BERT. With two different approaches—the feature-based method and the fine-tuning technique—the two models were able to fix the issue. The feature-based approach uses features extracted from the text by using the XLM-Roberta, Multilingual BERT models, and linguistic metadata features based on the DSM-IV and CES-D to accomplish classification. The fine-tuning method fine-tunes the available data using all the parameters in the XLM-Roberta and Multilingual-BERT models. The three data scenarios used in this investigation are monolingual, zero-shot, and multilingual. The experimental validation data showed that the use of linguistic metadata features has no discernible impact. Feature-based using Multilingual BERT with a learning rate of 1e ? 5 yielded an accuracy of 0.87 and an FI score of 0.588 with a zero-shot scenario. Additionally, in full fine-tuning using XLM-RoBERTa, with a learning rate of 1e ? 5, the accuracy of 0.93 and an F1 score of 0.815 is attained with the multilingual scenario. It was successful in outperforming the reconducted baseline method, where accuracy was 0.82 and the F1 score was 0.615.