QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
In recent years, the use of pre-trained models has dominated computational research in various fields, including natural language processing. One prominent pre-training model is the Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in becoming a state-of-the-art a...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/69111 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:69111 |
---|---|
spelling |
id-itb.:691112022-09-20T12:08:14ZQUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL Ayyub Abdurrahman, Muhammad Indonesia Final Project quantization, BERT model, model compression INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/69111 In recent years, the use of pre-trained models has dominated computational research in various fields, including natural language processing. One prominent pre-training model is the Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in becoming a state-of-the-art among other models and has been adapted into various languages, including Indonesian Language BERT, IndoBERT. Like the BERT model, IndoBERT has a large size, which raises issues related to latency and efficiency of the model. In order to alleviate the efficiency issue in IndoBERT, in this study we explore the possibility of using quantization to compress IndoBERT. Quantization is a technique for computing and storing tensors at a smaller bit precision. Quantization has the advantage that quantization only changes the bit size of the model weight, so model architecture does not need to be changed and effort to create a smaller model design is not necessary. Furthermore, quantization also has a very tiny performance drop to no reduction at all. Popular quantization methods are post training quantization and quantization aware training. Post training quantization is a quantization method in which the bit precision of the weight is reduced after fine-tuned. Quantization aware training is a method where quantization operations in the model are implemented during training/fine tuning with the aim of making the model being adaptive to the quantized weights and activations. Experiments were carried out using 7 downstream tasks and the results showed that the model had a good performance when compared to the full precision model. There is a decrease in performance in extreme cases, such as 4-bit quantization. Experiments also show that sequence labeling downstream task have higher sensitivity. The experimental results also show that the decrease in performance can be minimized by using the Quantization Aware Training method. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In recent years, the use of pre-trained models has dominated computational research in
various fields, including natural language processing. One prominent pre-training model is the
Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in
becoming a state-of-the-art among other models and has been adapted into various languages,
including Indonesian Language BERT, IndoBERT. Like the BERT model, IndoBERT has a large
size, which raises issues related to latency and efficiency of the model. In order to alleviate the
efficiency issue in IndoBERT, in this study we explore the possibility of using quantization to
compress IndoBERT.
Quantization is a technique for computing and storing tensors at a smaller bit precision.
Quantization has the advantage that quantization only changes the bit size of the model weight, so
model architecture does not need to be changed and effort to create a smaller model design is not
necessary. Furthermore, quantization also has a very tiny performance drop to no reduction at all.
Popular quantization methods are post training quantization and quantization aware training. Post
training quantization is a quantization method in which the bit precision of the weight is reduced
after fine-tuned. Quantization aware training is a method where quantization operations in the
model are implemented during training/fine tuning with the aim of making the model being
adaptive to the quantized weights and activations.
Experiments were carried out using 7 downstream tasks and the results showed that the
model had a good performance when compared to the full precision model. There is a decrease in
performance in extreme cases, such as 4-bit quantization. Experiments also show that sequence
labeling downstream task have higher sensitivity. The experimental results also show that the
decrease in performance can be minimized by using the Quantization Aware Training method. |
format |
Final Project |
author |
Ayyub Abdurrahman, Muhammad |
spellingShingle |
Ayyub Abdurrahman, Muhammad QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
author_facet |
Ayyub Abdurrahman, Muhammad |
author_sort |
Ayyub Abdurrahman, Muhammad |
title |
QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
title_short |
QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
title_full |
QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
title_fullStr |
QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
title_full_unstemmed |
QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL |
title_sort |
quantization implementation of indonesian bert language model |
url |
https://digilib.itb.ac.id/gdl/view/69111 |
_version_ |
1822990842174898176 |