DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL
A citation or quote is defined as the takeover of one or more sentences from another written work. In the citation, the author's opinion can be seen in the form of positive credit or negative criticism. In addition, it can also be seen what the author is trying to quote, such as the backgro...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56324 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | A citation or quote is defined as the takeover of one or more sentences from another written
work. In the citation, the author's opinion can be seen in the form of positive credit or negative
criticism. In addition, it can also be seen what the author is trying to quote, such as the background
of the journal, methods, and experimental results. The positive and negative opinions are called
sentiments, while the intentions quoted by the author are called intent. Knowing these two aspects
can help in getting the context of a scientific work. This can particularly assist medical researchers
in compiling research materials on the COVID-19 pandemic that are available on the CORD-19
dataset.
In this final project, we review the model with the highest performance for the sentiment
and intent classification task of citation sentences based on the current state of the art (SOTA) NLP
Transformer model, such as SciBERT and XLNet. SciBERT is a modification of the BERT which
has been the model with the highest performance since 2018 in various NLP tasks. SciBERT uses
more than 1,000,000 scientific papers in its pretrain stage to understand the context of the scientific
paper's domain. XLNet as a new transformer model, has been proven by Mercier as SOTA in these
two classification tasks (Mercier et al., 2020).
Experiments performed finetuning using the Scicite and ACL-ARC datasets with more
than 11,020 and 7000 data, respectively, with various hyperparameters, such as epochs and
learning rates, and F1 metrics to determine the most optimal model. The system design consists of
two Transformer models that classify the citation text into sentiment and intent classes, and
visualize it in the form of a network graph. Experiments and analysis show that SciBERT gives
results with the highest F1 macro average metric on both tasks, with a score of 0.87 on sentiment
classification and 0.83 on intent classification. Even though the F1 value is high, both models still
have difficulty in recognizing medical terms and pathogens contained in the CORD-19 dataset.
|
---|