GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES
Sentiment analysis of direct quote sentences aims to extract public figures' opinions on a particular matter using direct quotes from news. Direct quotes are direct speech of an individual and can be used as direct opinions. Traditional sentiment analysis methods cannot be directly applied t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/82483 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:82483 |
---|---|
spelling |
id-itb.:824832024-07-08T14:26:06ZGENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES Alexander Audino, Rio Indonesia Final Project sentiment analysis, direct quotes, generative approach INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/82483 Sentiment analysis of direct quote sentences aims to extract public figures' opinions on a particular matter using direct quotes from news. Direct quotes are direct speech of an individual and can be used as direct opinions. Traditional sentiment analysis methods cannot be directly applied to direct quote sentences. There are three stages in the sentiment analysis process of direct quote sentences: extraction, attribution, and polarity analysis. These stages aim to extract direct quote sentences, identify the speaker of the quote, determine the target of the quote, and analyze the polarity of the quote. In previous studies, sentiment analysis of direct quote sentences was performed using a regex system and Named Entity Recognition (NER). However, this approach did not perform well because the system could not understand the entire news context. To address this issue, a generative approach can be used to process the entire news context. The model used takes news documents as input and outputs direct quote sentences, speaker identification, quote targets, and quote polarities. Another approach variation involves using regex in the direct quote extraction stage, which can reduce the resources used by the generative model. The dataset construction process involves using GPT-4 to increase the quantity of data, resulting in 1000 training data documents. Fifty test news documents will be annotated by an annotator. The generative model will be trained on the training data through fine-tuning and tested on the test data. Experimental results show that the system aided by regex achieves the best performance. The system using the IndoT5-base-paraphrase model with regex assistance achieves F1 scores of 0.99 for quote extraction, 0.99 for speaker extraction, 0.74 for quote targets, and 0.81 for polarity analysis. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Sentiment analysis of direct quote sentences aims to extract public figures'
opinions on a particular matter using direct quotes from news. Direct quotes are
direct speech of an individual and can be used as direct opinions. Traditional
sentiment analysis methods cannot be directly applied to direct quote sentences.
There are three stages in the sentiment analysis process of direct quote sentences:
extraction, attribution, and polarity analysis. These stages aim to extract direct
quote sentences, identify the speaker of the quote, determine the target of the
quote, and analyze the polarity of the quote.
In previous studies, sentiment analysis of direct quote sentences was performed
using a regex system and Named Entity Recognition (NER). However, this
approach did not perform well because the system could not understand the entire
news context. To address this issue, a generative approach can be used to process
the entire news context. The model used takes news documents as input and
outputs direct quote sentences, speaker identification, quote targets, and quote
polarities. Another approach variation involves using regex in the direct quote
extraction stage, which can reduce the resources used by the generative model.
The dataset construction process involves using GPT-4 to increase the quantity of
data, resulting in 1000 training data documents. Fifty test news documents will be
annotated by an annotator. The generative model will be trained on the training
data through fine-tuning and tested on the test data. Experimental results show
that the system aided by regex achieves the best performance. The system using
the IndoT5-base-paraphrase model with regex assistance achieves F1 scores of
0.99 for quote extraction, 0.99 for speaker extraction, 0.74 for quote targets, and
0.81 for polarity analysis. |
format |
Final Project |
author |
Alexander Audino, Rio |
spellingShingle |
Alexander Audino, Rio GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
author_facet |
Alexander Audino, Rio |
author_sort |
Alexander Audino, Rio |
title |
GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
title_short |
GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
title_full |
GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
title_fullStr |
GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
title_full_unstemmed |
GENERATIVE SENTIMEN ANALYSIS ON INDONESIAâS ONLINE NEWS USING DIRECT QUOTATION SENTENCES |
title_sort |
generative sentimen analysis on indonesiaâs online news using direct quotation sentences |
url |
https://digilib.itb.ac.id/gdl/view/82483 |
_version_ |
1822997718453190656 |