OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM

BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning...

Full description

Saved in:
Bibliographic Details
Main Author: Legowo, Setyo
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/56144
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:56144
spelling id-itb.:561442021-06-21T13:28:03ZOPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM Legowo, Setyo Indonesia Theses optimization, bert, transformer, apache tvm, cpu x86 INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56144 BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning framework that commonly used by modeler in an instance able to run and optimized on particular target that not supported by those frameworks. Apache TVM offers good benefits to optimize any kind of machine learning model aside from its primary objective to convert those various models that already implemented on a ML framework to be able to run on any backend. Therefore, we analyze several factors that affect inference time of BERT after converted by TVM, that is SIMD extensions in the CPU, library that used by TVM runtime, and combination between core CPU and threads. With good settings, we can speedup 44% from default implementation. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning framework that commonly used by modeler in an instance able to run and optimized on particular target that not supported by those frameworks. Apache TVM offers good benefits to optimize any kind of machine learning model aside from its primary objective to convert those various models that already implemented on a ML framework to be able to run on any backend. Therefore, we analyze several factors that affect inference time of BERT after converted by TVM, that is SIMD extensions in the CPU, library that used by TVM runtime, and combination between core CPU and threads. With good settings, we can speedup 44% from default implementation.
format Theses
author Legowo, Setyo
spellingShingle Legowo, Setyo
OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
author_facet Legowo, Setyo
author_sort Legowo, Setyo
title OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_short OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_full OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_fullStr OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_full_unstemmed OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_sort optimizing inference performance of bert on cpus using apache tvm
url https://digilib.itb.ac.id/gdl/view/56144
_version_ 1822930109024174080