TFIDF meets deep document representation : a re-visit of co-training for text classification
Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not avai...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138643 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy. |
---|