TFIDF meets deep document representation : a re-visit of co-training for text classification

Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not avai...

Full description

Saved in:

Bibliographic Details
Main Author:	Chen, Zhiwei
Other Authors:	Sun Aixin
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Online Access:	https://hdl.handle.net/10356/138643
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-138643
record_format	dspace
spelling	sg-ntu-dr.10356-1386432020-05-11T06:18:13Z TFIDF meets deep document representation : a re-visit of co-training for text classification Chen, Zhiwei Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy. Bachelor of Engineering (Computer Science) 2020-05-11T06:18:13Z 2020-05-11T06:18:13Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138643 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Document and text processing Chen, Zhiwei TFIDF meets deep document representation : a re-visit of co-training for text classification
description	Many text classification tasks face the challenge of lack of sufficient la- belled data. Co-training algorithm is a candidate solution, which learns from both labeled and unlabelled data for better classification accuracy. However, two sufficient and redundant views of an instance are often not available to fully facilitate co-training in the past. With the recent develop- ment of deep learning, we now have both traditional TFIDF representation and deep representation for documents. In this paper, we conduct exper- iments to evaluate the effectiveness of co-training with different combina- tions of document representations (e.g., TFIDF, Doc2vec, ELMo, BERT) and classifiers (e.g., SVM, Random Forest, XGBoost, MLP, and CNN) on two benchmark datasets (20 Newsgroup and Ohsumed). Our results show that co-training with TFIDF and deep contextualised representation offers improvement to classification accuracy.
author2	Sun Aixin
author_facet	Sun Aixin Chen, Zhiwei
format	Final Year Project
author	Chen, Zhiwei
author_sort	Chen, Zhiwei
title	TFIDF meets deep document representation : a re-visit of co-training for text classification
title_short	TFIDF meets deep document representation : a re-visit of co-training for text classification
title_full	TFIDF meets deep document representation : a re-visit of co-training for text classification
title_fullStr	TFIDF meets deep document representation : a re-visit of co-training for text classification
title_full_unstemmed	TFIDF meets deep document representation : a re-visit of co-training for text classification
title_sort	tfidf meets deep document representation : a re-visit of co-training for text classification
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/138643
_version_	1681059637090058240

TFIDF meets deep document representation : a re-visit of co-training for text classification

Similar Items