TFIDF meets deep document representation : a re-visit of co-training for text classification
Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not avai...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138643 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-138643 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1386432020-05-11T06:18:13Z TFIDF meets deep document representation : a re-visit of co-training for text classification Chen, Zhiwei Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy. Bachelor of Engineering (Computer Science) 2020-05-11T06:18:13Z 2020-05-11T06:18:13Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138643 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing Chen, Zhiwei TFIDF meets deep document representation : a re-visit of co-training for text classification |
description |
Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Chen, Zhiwei |
format |
Final Year Project |
author |
Chen, Zhiwei |
author_sort |
Chen, Zhiwei |
title |
TFIDF meets deep document representation : a re-visit of co-training for text classification |
title_short |
TFIDF meets deep document representation : a re-visit of co-training for text classification |
title_full |
TFIDF meets deep document representation : a re-visit of co-training for text classification |
title_fullStr |
TFIDF meets deep document representation : a re-visit of co-training for text classification |
title_full_unstemmed |
TFIDF meets deep document representation : a re-visit of co-training for text classification |
title_sort |
tfidf meets deep document representation : a re-visit of co-training for text classification |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/138643 |
_version_ |
1681059637090058240 |