Pre-training model based on the transfer learning in natural language processing

Transfer learning is to apply knowledge or patterns learned in a particular field or task to different but related areas or problem. It is very prominent in terms of scarcity of data and heterogeneity of domain distribution. In the field of natural language processing, transfer learning is embodied...

Full description

Saved in:
Bibliographic Details
Main Author: Tang, Jiayi
Other Authors: Mao Kezhi
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/78688
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Transfer learning is to apply knowledge or patterns learned in a particular field or task to different but related areas or problem. It is very prominent in terms of scarcity of data and heterogeneity of domain distribution. In the field of natural language processing, transfer learning is embodied in the pre-training model. There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based (ELMO) and fine-tuning (GPT、BERT). In 2018, Google released a large-scale pre-training language model BERT, which stands for Bidirectional Encoder Representations from Transformer. Compared with other pre-training model ELMO and GPT, and the classical model CNN, BERT is the latest and best-performing model up until now. Its highlights are (1) Bidirectional Transformer (2) Mask-Language Model (3) Next Sentence Prediction (4) A more general input layer and output layer. BERT model can efficiently learn text information and apply it to various NLP tasks. In this report, we use the BERT model in two way. The first is to use the pre-training model released by Google directly and then pass the fine-tuning stage. The second is to use the BERT-as-service to use the BERT model as a sentence encode followed by a DNN classifier. Then we horizontally compare BERT with ELMO and GPT; then vertically compare BERT with different parameters.