OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56144 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:56144 |
---|---|
spelling |
id-itb.:561442021-06-21T13:28:03ZOPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM Legowo, Setyo Indonesia Theses optimization, bert, transformer, apache tvm, cpu x86 INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56144 BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning framework that commonly used by modeler in an instance able to run and optimized on particular target that not supported by those frameworks. Apache TVM offers good benefits to optimize any kind of machine learning model aside from its primary objective to convert those various models that already implemented on a ML framework to be able to run on any backend. Therefore, we analyze several factors that affect inference time of BERT after converted by TVM, that is SIMD extensions in the CPU, library that used by TVM runtime, and combination between core CPU and threads. With good settings, we can speedup 44% from default implementation. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
BERT as a model that composed of Transformer layer is game changing for field
of natural language processing (NLP). There has been a lot of study to speedup
training the model however only relatively little efforts are made to improve their
inference performance. Also, not all machine learning framework that commonly
used by modeler in an instance able to run and optimized on particular target that
not supported by those frameworks. Apache TVM offers good benefits to optimize
any kind of machine learning model aside from its primary objective to convert
those various models that already implemented on a ML framework to be able to
run on any backend. Therefore, we analyze several factors that affect inference time
of BERT after converted by TVM, that is SIMD extensions in the CPU, library that
used by TVM runtime, and combination between core CPU and threads. With good
settings, we can speedup 44% from default implementation. |
format |
Theses |
author |
Legowo, Setyo |
spellingShingle |
Legowo, Setyo OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
author_facet |
Legowo, Setyo |
author_sort |
Legowo, Setyo |
title |
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
title_short |
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
title_full |
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
title_fullStr |
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
title_full_unstemmed |
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM |
title_sort |
optimizing inference performance of bert on cpus using apache tvm |
url |
https://digilib.itb.ac.id/gdl/view/56144 |
_version_ |
1822930109024174080 |