RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING

Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific...

Full description

Saved in:
Bibliographic Details
Main Author: - NIM : 13514032 , Chalvin
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/26230
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific paper summarization systems. There have been several studies that tried to automate rhetorical classifications. For categorization of 7 rhetoric sentences, Teufel's research managed to get an f-score of 0.51 using naive bayes in 2002. Merity et al. managed to get an f-score of 0.93 by using the maximum entropy classifier in 2009. Research on the classification of 16 categories was pioneered by Widyantoro et al. who managed to get an fscore of 0.25 using various techniques. Rachman succeeded in getting f-measure around 0.43 in 2017 using the shallow learning method with word2vec and sequence labeling. This shows that the feature engineering in previous researches was not optimal. <br /> <br /> <br /> <br /> <br /> Lately, a lot of researches in natural language processing uses deep learning. This happens because deep learning is able to capture high level features automatically by combining simpler features. Without deep learning, high level features can only be obtained if a system with an understanding of data similar to human understanding can be made. With the use of deep learning, various models built to answer problems in natural language processing get the best performance. <br /> <br /> <br /> <br /> <br /> In this study, various deep learning architectures were tested to optimize feature engineering in rhetorical sentence categorization research. The architectures tested in this study include CNN, GRU, LSTM, Bi-GRU, and Bi-LSTM. These architectures was chosen because they have been proven to get good results in sentence categorization. The best model in this study gets an f-measure of 0.457.