Transformers acceleration on autoNLP document classification

Unsupervised pre-training has been widely used in the field of Natural Language Processing, by training a huge network with unsupervised prediction tasks, one of the representatives is the BERT model. BERT has achieved great success in various NLP downstream tasks by reaching state-of-the-art result...

全面介紹

Saved in:
書目詳細資料
主要作者: Cao, Hannan
其他作者: Sinno Jialin Pan
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2020
主題:
在線閱讀:https://hdl.handle.net/10356/138506
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Unsupervised pre-training has been widely used in the field of Natural Language Processing, by training a huge network with unsupervised prediction tasks, one of the representatives is the BERT model. BERT has achieved great success in various NLP downstream tasks by reaching state-of-the-art result on major NLP tasks. However, BERT has used more than 110M parameters, which requires a huge amount of training time and computing resources. Therefore, weight reduction is becoming critical to train BERT efficiently. In this Final Year Project, we first explored the BERT performance in the field of Document Classification. We then proposed a new method to reduce the BERT’s weight as well as the training time with the help of weight pruning method, our experiment shows that our new method could reduce the training time required by about 20%, and achieved higher performance comparing to the original BERT method. We also applied the ensemble method to these pruned networks to further increase the model’s performance and has improved the baseline about 2% for the AAPD, Reuters and IMDB datasets.