Clustering together with learning representations
Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/158048 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-158048 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1580482023-07-07T19:26:15Z Clustering together with learning representations Yu, Shuaiqi Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-26T06:45:12Z 2022-05-26T06:45:12Z 2022 Final Year Project (FYP) Yu, S. (2022). Clustering together with learning representations. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158048 https://hdl.handle.net/10356/158048 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Yu, Shuaiqi Clustering together with learning representations |
description |
Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons. |
author2 |
Lihui Chen |
author_facet |
Lihui Chen Yu, Shuaiqi |
format |
Final Year Project |
author |
Yu, Shuaiqi |
author_sort |
Yu, Shuaiqi |
title |
Clustering together with learning representations |
title_short |
Clustering together with learning representations |
title_full |
Clustering together with learning representations |
title_fullStr |
Clustering together with learning representations |
title_full_unstemmed |
Clustering together with learning representations |
title_sort |
clustering together with learning representations |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/158048 |
_version_ |
1772828116829339648 |