RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING
Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/26230 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:26230 |
---|---|
spelling |
id-itb.:262302018-10-01T10:20:08ZRHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING - NIM : 13514032 , Chalvin Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26230 Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific paper summarization systems. There have been several studies that tried to automate rhetorical classifications. For categorization of 7 rhetoric sentences, Teufel's research managed to get an f-score of 0.51 using naive bayes in 2002. Merity et al. managed to get an f-score of 0.93 by using the maximum entropy classifier in 2009. Research on the classification of 16 categories was pioneered by Widyantoro et al. who managed to get an fscore of 0.25 using various techniques. Rachman succeeded in getting f-measure around 0.43 in 2017 using the shallow learning method with word2vec and sequence labeling. This shows that the feature engineering in previous researches was not optimal. <br /> <br /> <br /> <br /> <br /> Lately, a lot of researches in natural language processing uses deep learning. This happens because deep learning is able to capture high level features automatically by combining simpler features. Without deep learning, high level features can only be obtained if a system with an understanding of data similar to human understanding can be made. With the use of deep learning, various models built to answer problems in natural language processing get the best performance. <br /> <br /> <br /> <br /> <br /> In this study, various deep learning architectures were tested to optimize feature engineering in rhetorical sentence categorization research. The architectures tested in this study include CNN, GRU, LSTM, Bi-GRU, and Bi-LSTM. These architectures was chosen because they have been proven to get good results in sentence categorization. The best model in this study gets an f-measure of 0.457. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Rhetorical Document Profile (RDP) is an information framework used to structure the contentsof a scientific paper. RDP divides sentences in scientific papers into 7 to 16 rhetorical categories based on the sentence. RDP can be used as structured data as an input from other systems such as scientific paper summarization systems. There have been several studies that tried to automate rhetorical classifications. For categorization of 7 rhetoric sentences, Teufel's research managed to get an f-score of 0.51 using naive bayes in 2002. Merity et al. managed to get an f-score of 0.93 by using the maximum entropy classifier in 2009. Research on the classification of 16 categories was pioneered by Widyantoro et al. who managed to get an fscore of 0.25 using various techniques. Rachman succeeded in getting f-measure around 0.43 in 2017 using the shallow learning method with word2vec and sequence labeling. This shows that the feature engineering in previous researches was not optimal. <br />
<br />
<br />
<br />
<br />
Lately, a lot of researches in natural language processing uses deep learning. This happens because deep learning is able to capture high level features automatically by combining simpler features. Without deep learning, high level features can only be obtained if a system with an understanding of data similar to human understanding can be made. With the use of deep learning, various models built to answer problems in natural language processing get the best performance. <br />
<br />
<br />
<br />
<br />
In this study, various deep learning architectures were tested to optimize feature engineering in rhetorical sentence categorization research. The architectures tested in this study include CNN, GRU, LSTM, Bi-GRU, and Bi-LSTM. These architectures was chosen because they have been proven to get good results in sentence categorization. The best model in this study gets an f-measure of 0.457. |
format |
Final Project |
author |
- NIM : 13514032 , Chalvin |
spellingShingle |
- NIM : 13514032 , Chalvin RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
author_facet |
- NIM : 13514032 , Chalvin |
author_sort |
- NIM : 13514032 , Chalvin |
title |
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
title_short |
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
title_full |
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
title_fullStr |
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
title_full_unstemmed |
RHETORICAL SENTENCE CATEGORIZATION IN SCIENTIFIC PAPERS WITH DEEP LEARNING |
title_sort |
rhetorical sentence categorization in scientific papers with deep learning |
url |
https://digilib.itb.ac.id/gdl/view/26230 |
_version_ |
1822020948233027584 |