DEVELOPMENT OF SENTIMENT ANALYSIS AND INTENT CLASSIFICATION OF SCIENTIFIC JOURNAL'S CITATION MODEL

A citation or quote is defined as the takeover of one or more sentences from another written work. In the citation, the author's opinion can be seen in the form of positive credit or negative criticism. In addition, it can also be seen what the author is trying to quote, such as the backgro...

全面介紹

Saved in:
書目詳細資料
主要作者: Mahendra Guntara Harsono, Rayza
格式: Final Project
語言:Indonesia
在線閱讀:https://digilib.itb.ac.id/gdl/view/56324
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:A citation or quote is defined as the takeover of one or more sentences from another written work. In the citation, the author's opinion can be seen in the form of positive credit or negative criticism. In addition, it can also be seen what the author is trying to quote, such as the background of the journal, methods, and experimental results. The positive and negative opinions are called sentiments, while the intentions quoted by the author are called intent. Knowing these two aspects can help in getting the context of a scientific work. This can particularly assist medical researchers in compiling research materials on the COVID-19 pandemic that are available on the CORD-19 dataset. In this final project, we review the model with the highest performance for the sentiment and intent classification task of citation sentences based on the current state of the art (SOTA) NLP Transformer model, such as SciBERT and XLNet. SciBERT is a modification of the BERT which has been the model with the highest performance since 2018 in various NLP tasks. SciBERT uses more than 1,000,000 scientific papers in its pretrain stage to understand the context of the scientific paper's domain. XLNet as a new transformer model, has been proven by Mercier as SOTA in these two classification tasks (Mercier et al., 2020). Experiments performed finetuning using the Scicite and ACL-ARC datasets with more than 11,020 and 7000 data, respectively, with various hyperparameters, such as epochs and learning rates, and F1 metrics to determine the most optimal model. The system design consists of two Transformer models that classify the citation text into sentiment and intent classes, and visualize it in the form of a network graph. Experiments and analysis show that SciBERT gives results with the highest F1 macro average metric on both tasks, with a score of 0.87 on sentiment classification and 0.83 on intent classification. Even though the F1 value is high, both models still have difficulty in recognizing medical terms and pathogens contained in the CORD-19 dataset.