Pre-training model based on the transfer learning in natural language processing
Transfer learning is to apply knowledge or patterns learned in a particular field or task to different but related areas or problem. It is very prominent in terms of scarcity of data and heterogeneity of domain distribution. In the field of natural language processing, transfer learning is embodied...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/78688 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Transfer learning is to apply knowledge or patterns learned in a particular field or task to different but related areas or problem. It is very prominent in terms of scarcity of data and heterogeneity of domain distribution. In the field of natural language processing, transfer learning is embodied in the pre-training model. There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based (ELMO) and fine-tuning (GPT、BERT).
In 2018, Google released a large-scale pre-training language model BERT, which stands for Bidirectional Encoder Representations from Transformer. Compared with other pre-training model ELMO and GPT, and the classical model CNN, BERT is the latest and best-performing model up until now. Its highlights are (1) Bidirectional Transformer (2) Mask-Language Model (3) Next Sentence Prediction (4) A more general input layer and output layer. BERT model can efficiently learn text information and apply it to various NLP tasks.
In this report, we use the BERT model in two way. The first is to use the pre-training model released by Google directly and then pass the fine-tuning stage. The second is to use the BERT-as-service to use the BERT model as a sentence encode followed by a DNN classifier. Then we horizontally compare BERT with ELMO and GPT; then vertically compare BERT with different parameters. |
---|