DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL
A citation or quote is defined as the takeover of one or more sentences from another written work. In the citation, the author's opinion can be seen in the form of positive credit or negative criticism. In addition, it can also be seen what the author is trying to quote, such as the backgro...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56324 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:56324 |
---|---|
spelling |
id-itb.:563242021-06-22T06:38:31ZDEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL Mahendra Guntara Harsono, Rayza Indonesia Final Project citation, sentiment, intent, transformer model INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56324 A citation or quote is defined as the takeover of one or more sentences from another written work. In the citation, the author's opinion can be seen in the form of positive credit or negative criticism. In addition, it can also be seen what the author is trying to quote, such as the background of the journal, methods, and experimental results. The positive and negative opinions are called sentiments, while the intentions quoted by the author are called intent. Knowing these two aspects can help in getting the context of a scientific work. This can particularly assist medical researchers in compiling research materials on the COVID-19 pandemic that are available on the CORD-19 dataset. In this final project, we review the model with the highest performance for the sentiment and intent classification task of citation sentences based on the current state of the art (SOTA) NLP Transformer model, such as SciBERT and XLNet. SciBERT is a modification of the BERT which has been the model with the highest performance since 2018 in various NLP tasks. SciBERT uses more than 1,000,000 scientific papers in its pretrain stage to understand the context of the scientific paper's domain. XLNet as a new transformer model, has been proven by Mercier as SOTA in these two classification tasks (Mercier et al., 2020). Experiments performed finetuning using the Scicite and ACL-ARC datasets with more than 11,020 and 7000 data, respectively, with various hyperparameters, such as epochs and learning rates, and F1 metrics to determine the most optimal model. The system design consists of two Transformer models that classify the citation text into sentiment and intent classes, and visualize it in the form of a network graph. Experiments and analysis show that SciBERT gives results with the highest F1 macro average metric on both tasks, with a score of 0.87 on sentiment classification and 0.83 on intent classification. Even though the F1 value is high, both models still have difficulty in recognizing medical terms and pathogens contained in the CORD-19 dataset. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
A citation or quote is defined as the takeover of one or more sentences from another written
work. In the citation, the author's opinion can be seen in the form of positive credit or negative
criticism. In addition, it can also be seen what the author is trying to quote, such as the background
of the journal, methods, and experimental results. The positive and negative opinions are called
sentiments, while the intentions quoted by the author are called intent. Knowing these two aspects
can help in getting the context of a scientific work. This can particularly assist medical researchers
in compiling research materials on the COVID-19 pandemic that are available on the CORD-19
dataset.
In this final project, we review the model with the highest performance for the sentiment
and intent classification task of citation sentences based on the current state of the art (SOTA) NLP
Transformer model, such as SciBERT and XLNet. SciBERT is a modification of the BERT which
has been the model with the highest performance since 2018 in various NLP tasks. SciBERT uses
more than 1,000,000 scientific papers in its pretrain stage to understand the context of the scientific
paper's domain. XLNet as a new transformer model, has been proven by Mercier as SOTA in these
two classification tasks (Mercier et al., 2020).
Experiments performed finetuning using the Scicite and ACL-ARC datasets with more
than 11,020 and 7000 data, respectively, with various hyperparameters, such as epochs and
learning rates, and F1 metrics to determine the most optimal model. The system design consists of
two Transformer models that classify the citation text into sentiment and intent classes, and
visualize it in the form of a network graph. Experiments and analysis show that SciBERT gives
results with the highest F1 macro average metric on both tasks, with a score of 0.87 on sentiment
classification and 0.83 on intent classification. Even though the F1 value is high, both models still
have difficulty in recognizing medical terms and pathogens contained in the CORD-19 dataset.
|
format |
Final Project |
author |
Mahendra Guntara Harsono, Rayza |
spellingShingle |
Mahendra Guntara Harsono, Rayza DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
author_facet |
Mahendra Guntara Harsono, Rayza |
author_sort |
Mahendra Guntara Harsono, Rayza |
title |
DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
title_short |
DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
title_full |
DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
title_fullStr |
DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
title_full_unstemmed |
DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL |
title_sort |
development of sentiment analysis and intent classification of scientific journal's citation model |
url |
https://digilib.itb.ac.id/gdl/view/56324 |
_version_ |
1822002329633685504 |