MAGNET ARCHITECTURE OPTIMIZATION ON MULTI-LABEL TEXT CLASSIFICATION

Multi-label text classification is a matter of categorizing each text into one or more categories. MAGNET is a deep learning model architecture that combines Graph Attention Networks, BiLSTM, and BERT embeddings to address multi-label text classification task. MAGNET utilizes Graph Attention N...

Full description

Saved in:
Bibliographic Details
Main Author: Adrinta Abdurrazzaq, Muhammad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/58050
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Multi-label text classification is a matter of categorizing each text into one or more categories. MAGNET is a deep learning model architecture that combines Graph Attention Networks, BiLSTM, and BERT embeddings to address multi-label text classification task. MAGNET utilizes Graph Attention Networks to obtain dependency information between labels by paying attention to the dependencies. MAGNET has limitations in handling data with many labels. This causes the adjacency matrix that is formed to be very large and makes the model difficult to train because it requires large computational resources. In this research, label clustering was used to reduce the dimension of the adjacency matrix. The labels are grouped into several clusters first, then the labels that are in the same cluster form their own adjacency matrix. Louvain's algorithm is used to cluster labels, where Louvain's algorithm is used for graph data structures. For this reason, the adjacency matrix can represent a graph of the dependencies between labels and be used as input for Louvain's Algorithm. On the other hand, the use of fine-tuning layers with BiGRU and embedding methods using XLNet can be tried because BiGRU and XLNet have better performance than BiLSTM and BERT in other researches. From the results of the research, the two proposed architectures that were tested on three different data were able to produce similar or better performance compared to the base MAGNET architecture. However, when there is too much weight of dependencies between labels that are lost due to clustering, the proposed architecture cannot perform as well as the model that uses the base MAGNET architecture. Meanwhile, the combination of BiGRU and XLNet embeddings outperformed the performance of the combination of BiLSTM and BERT embeddings used in previous reearch.