OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM

BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning...

Full description

Saved in:

Bibliographic Details
Main Author:	Legowo, Setyo
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/56144
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:56144
spelling	id-itb.:561442021-06-21T13:28:03ZOPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM Legowo, Setyo Indonesia Theses optimization, bert, transformer, apache tvm, cpu x86 INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56144 BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning framework that commonly used by modeler in an instance able to run and optimized on particular target that not supported by those frameworks. Apache TVM offers good benefits to optimize any kind of machine learning model aside from its primary objective to convert those various models that already implemented on a ML framework to be able to run on any backend. Therefore, we analyze several factors that affect inference time of BERT after converted by TVM, that is SIMD extensions in the CPU, library that used by TVM runtime, and combination between core CPU and threads. With good settings, we can speedup 44% from default implementation. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	BERT as a model that composed of Transformer layer is game changing for field of natural language processing (NLP). There has been a lot of study to speedup training the model however only relatively little efforts are made to improve their inference performance. Also, not all machine learning framework that commonly used by modeler in an instance able to run and optimized on particular target that not supported by those frameworks. Apache TVM offers good benefits to optimize any kind of machine learning model aside from its primary objective to convert those various models that already implemented on a ML framework to be able to run on any backend. Therefore, we analyze several factors that affect inference time of BERT after converted by TVM, that is SIMD extensions in the CPU, library that used by TVM runtime, and combination between core CPU and threads. With good settings, we can speedup 44% from default implementation.
format	Theses
author	Legowo, Setyo
spellingShingle	Legowo, Setyo OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
author_facet	Legowo, Setyo
author_sort	Legowo, Setyo
title	OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_short	OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_full	OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_fullStr	OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_full_unstemmed	OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM
title_sort	optimizing inference performance of bert on cpus using apache tvm
url	https://digilib.itb.ac.id/gdl/view/56144
_version_	1822930109024174080

OPTIMIZING INFERENCE PERFORMANCE OF BERT ON CPUS USING APACHE TVM

Similar Items