RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING

Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific...

Full description

Saved in:
Bibliographic Details
Main Author: - NIM : 13514032 , Chalvin
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/26230
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:26230
spelling id-itb.:262302018-10-01T10:20:08ZRHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING - NIM : 13514032 , Chalvin Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26230 Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific paper summarization systems. There have been several studies that tried to automate rhetorical classifications. For categorization of 7 rhetoric sentences, Teufel's research managed to get an f-score of 0.51 using naive bayes in 2002. Merity et al. managed to get an f-score of 0.93 by using the maximum entropy classifier in 2009. Research on the classification of 16 categories was pioneered by Widyantoro et al. who managed to get an fscore of 0.25 using various techniques. Rachman succeeded in getting f-measure around 0.43 in 2017 using the shallow learning method with word2vec and sequence labeling. This shows that the feature engineering in previous researches was not optimal. <br /> <br /> <br /> <br /> <br /> Lately, a lot of researches in natural language processing uses deep learning. This happens because deep learning is able to capture high level features automatically by combining simpler features. Without deep learning, high level features can only be obtained if a system with an understanding of data similar to human understanding can be made. With the use of deep learning, various models built to answer problems in natural language processing get the best performance. <br /> <br /> <br /> <br /> <br /> In this study, various deep learning architectures were tested to optimize feature engineering in rhetorical sentence categorization research. The architectures tested in this study include CNN, GRU, LSTM, Bi-GRU, and Bi-LSTM. These architectures was chosen because they have been proven to get good results in sentence categorization. The best model in this study gets an f-measure of 0.457. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific paper summarization systems. There have been several studies that tried to automate rhetorical classifications. For categorization of 7 rhetoric sentences, Teufel's research managed to get an f-score of 0.51 using naive bayes in 2002. Merity et al. managed to get an f-score of 0.93 by using the maximum entropy classifier in 2009. Research on the classification of 16 categories was pioneered by Widyantoro et al. who managed to get an fscore of 0.25 using various techniques. Rachman succeeded in getting f-measure around 0.43 in 2017 using the shallow learning method with word2vec and sequence labeling. This shows that the feature engineering in previous researches was not optimal. <br /> <br /> <br /> <br /> <br /> Lately, a lot of researches in natural language processing uses deep learning. This happens because deep learning is able to capture high level features automatically by combining simpler features. Without deep learning, high level features can only be obtained if a system with an understanding of data similar to human understanding can be made. With the use of deep learning, various models built to answer problems in natural language processing get the best performance. <br /> <br /> <br /> <br /> <br /> In this study, various deep learning architectures were tested to optimize feature engineering in rhetorical sentence categorization research. The architectures tested in this study include CNN, GRU, LSTM, Bi-GRU, and Bi-LSTM. These architectures was chosen because they have been proven to get good results in sentence categorization. The best model in this study gets an f-measure of 0.457.
format Final Project
author - NIM : 13514032 , Chalvin
spellingShingle - NIM : 13514032 , Chalvin
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
author_facet - NIM : 13514032 , Chalvin
author_sort - NIM : 13514032 , Chalvin
title RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
title_short RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
title_full RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
title_fullStr RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
title_full_unstemmed RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
title_sort rhetorical sentence categorization in scientific papers with deep learning
url https://digilib.itb.ac.id/gdl/view/26230
_version_ 1822020948233027584