Performance profiling and optimizations in distributed deep learning frameworks
Deep learning has been a very popular topic in Artificial Intelligent industry these years and can be applied to many fields, such as computer vision, natural language processing and so forth. However, training a deep learning model usually takes lots of time. It is necessary to identify the bottlen...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/149382 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Deep learning has been a very popular topic in Artificial Intelligent industry these years and can be applied to many fields, such as computer vision, natural language processing and so forth. However, training a deep learning model usually takes lots of time. It is necessary to identify the bottleneck of the deep learning process and implement optimizations on them to improve the training efficiency, especially the training speed. Usually, optimizations are implemented in two aspects: data processing and model training.
In this work, multiple optimization methods are studied and conducted to check their corresponding effect. Regarding data processing, optimizations such as parallelization of multiple transforming processes, dataset caching, prefetching of data samples are implemented. Regarding training, data parallelism of distributed training is especially studied, and two current popular frameworks are utilized to achieve it. Experiments are conducted to compare the two frameworks and analyze possible influencing factors’ effect on the training speed. |
---|