TOKEN CLASSIFICATION ON INDONESIAN ARTICLE FOR 5W1H EVENT EXTRACTION WITH CNN-BIDIRECTIONAL LSTM
A news article contains information about events that comprise what events occur (what), the participants involved (who), the place (where) and time (when) of the event, as well as the event description of why and how events can occur, also known as 5W1H information. <br /> <br /> <...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/21251 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | A news article contains information about events that comprise what events occur (what), the participants involved (who), the place (where) and time (when) of the event, as well as the event description of why and how events can occur, also known as 5W1H information. <br />
<br />
<br />
By 5W1H information in the texts or documents, it is easy to understand their overall information. In order to find out the 5W1H information in the text, it can be automatically done by applying information extraction technique. <br />
<br />
<br />
The information extraction of 5W1H in an Indonesian article can be done by classifying each token in the article into 13 classes, namely B-who, I-who, B-what, I-what, B-when, I-when, B-where, I-where, B-why, I-why, B-how, I-how, and Other. Information-context token in both lexical and sentence level are used in order to determine token label. Furthermore, Convolutional Neural Network (CNN) is used to extract syntactic features and semantics in the sentences while Bidirectional Long Short Term Memory (BLSTM) is used to learn sequential modeling of lexical token level. The result of study is that the average of the performance of F-measure model is 0.808 with feature set which consist of token features and relative position among tokens in the sentences (SENT), feature of lexical sequences (LEX), and token location (LOCT and LOCS). The experimental result shows that deep learning method CNN-BLSTM outperforms other shallow method namely IBk, C4.5, and Naïve Bayes. The best performance was obtained by CNN-BLSTM with F-measure 0.808, while IBk, C4.5, and Naïve Bayes were obtained F-measure 0.655, 0.645, and 0.595, respectively. <br />
|
---|