GENERATIVE SENTIMEN ANALYSIS ON INDONESIA’S ONLINE NEWS USING DIRECT QUOTATION SENTENCES

Sentiment analysis of direct quote sentences aims to extract public figures' opinions on a particular matter using direct quotes from news. Direct quotes are direct speech of an individual and can be used as direct opinions. Traditional sentiment analysis methods cannot be directly applied t...

Full description

Saved in:
Bibliographic Details
Main Author: Alexander Audino, Rio
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/82483
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Sentiment analysis of direct quote sentences aims to extract public figures' opinions on a particular matter using direct quotes from news. Direct quotes are direct speech of an individual and can be used as direct opinions. Traditional sentiment analysis methods cannot be directly applied to direct quote sentences. There are three stages in the sentiment analysis process of direct quote sentences: extraction, attribution, and polarity analysis. These stages aim to extract direct quote sentences, identify the speaker of the quote, determine the target of the quote, and analyze the polarity of the quote. In previous studies, sentiment analysis of direct quote sentences was performed using a regex system and Named Entity Recognition (NER). However, this approach did not perform well because the system could not understand the entire news context. To address this issue, a generative approach can be used to process the entire news context. The model used takes news documents as input and outputs direct quote sentences, speaker identification, quote targets, and quote polarities. Another approach variation involves using regex in the direct quote extraction stage, which can reduce the resources used by the generative model. The dataset construction process involves using GPT-4 to increase the quantity of data, resulting in 1000 training data documents. Fifty test news documents will be annotated by an annotator. The generative model will be trained on the training data through fine-tuning and tested on the test data. Experimental results show that the system aided by regex achieves the best performance. The system using the IndoT5-base-paraphrase model with regex assistance achieves F1 scores of 0.99 for quote extraction, 0.99 for speaker extraction, 0.74 for quote targets, and 0.81 for polarity analysis.