QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL

In recent years, the use of pre-trained models has dominated computational research in various fields, including natural language processing. One prominent pre-training model is the Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in becoming a state-of-the-art a...

Full description

Saved in:

Bibliographic Details
Main Author:	Ayyub Abdurrahman, Muhammad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/69111
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:69111
spelling	id-itb.:691112022-09-20T12:08:14ZQUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL Ayyub Abdurrahman, Muhammad Indonesia Final Project quantization, BERT model, model compression INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/69111 In recent years, the use of pre-trained models has dominated computational research in various fields, including natural language processing. One prominent pre-training model is the Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in becoming a state-of-the-art among other models and has been adapted into various languages, including Indonesian Language BERT, IndoBERT. Like the BERT model, IndoBERT has a large size, which raises issues related to latency and efficiency of the model. In order to alleviate the efficiency issue in IndoBERT, in this study we explore the possibility of using quantization to compress IndoBERT. Quantization is a technique for computing and storing tensors at a smaller bit precision. Quantization has the advantage that quantization only changes the bit size of the model weight, so model architecture does not need to be changed and effort to create a smaller model design is not necessary. Furthermore, quantization also has a very tiny performance drop to no reduction at all. Popular quantization methods are post training quantization and quantization aware training. Post training quantization is a quantization method in which the bit precision of the weight is reduced after fine-tuned. Quantization aware training is a method where quantization operations in the model are implemented during training/fine tuning with the aim of making the model being adaptive to the quantized weights and activations. Experiments were carried out using 7 downstream tasks and the results showed that the model had a good performance when compared to the full precision model. There is a decrease in performance in extreme cases, such as 4-bit quantization. Experiments also show that sequence labeling downstream task have higher sensitivity. The experimental results also show that the decrease in performance can be minimized by using the Quantization Aware Training method. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In recent years, the use of pre-trained models has dominated computational research in various fields, including natural language processing. One prominent pre-training model is the Bidirectional Encoder Representations from Transformers (BERT). BERT has succeeded in becoming a state-of-the-art among other models and has been adapted into various languages, including Indonesian Language BERT, IndoBERT. Like the BERT model, IndoBERT has a large size, which raises issues related to latency and efficiency of the model. In order to alleviate the efficiency issue in IndoBERT, in this study we explore the possibility of using quantization to compress IndoBERT. Quantization is a technique for computing and storing tensors at a smaller bit precision. Quantization has the advantage that quantization only changes the bit size of the model weight, so model architecture does not need to be changed and effort to create a smaller model design is not necessary. Furthermore, quantization also has a very tiny performance drop to no reduction at all. Popular quantization methods are post training quantization and quantization aware training. Post training quantization is a quantization method in which the bit precision of the weight is reduced after fine-tuned. Quantization aware training is a method where quantization operations in the model are implemented during training/fine tuning with the aim of making the model being adaptive to the quantized weights and activations. Experiments were carried out using 7 downstream tasks and the results showed that the model had a good performance when compared to the full precision model. There is a decrease in performance in extreme cases, such as 4-bit quantization. Experiments also show that sequence labeling downstream task have higher sensitivity. The experimental results also show that the decrease in performance can be minimized by using the Quantization Aware Training method.
format	Final Project
author	Ayyub Abdurrahman, Muhammad
spellingShingle	Ayyub Abdurrahman, Muhammad QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
author_facet	Ayyub Abdurrahman, Muhammad
author_sort	Ayyub Abdurrahman, Muhammad
title	QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
title_short	QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
title_full	QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
title_fullStr	QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
title_full_unstemmed	QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL
title_sort	quantization implementation of indonesian bert language model
url	https://digilib.itb.ac.id/gdl/view/69111
_version_	1822990842174898176

QUANTIZATION IMPLEMENTATION OF INDONESIAN BERT LANGUAGE MODEL

Similar Items