Clustering together with learning representations

Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups...

Full description

Saved in:
Bibliographic Details
Main Author: Yu, Shuaiqi
Other Authors: Lihui Chen
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158048
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-158048
record_format dspace
spelling sg-ntu-dr.10356-1580482023-07-07T19:26:15Z Clustering together with learning representations Yu, Shuaiqi Lihui Chen School of Electrical and Electronic Engineering ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-26T06:45:12Z 2022-05-26T06:45:12Z 2022 Final Year Project (FYP) Yu, S. (2022). Clustering together with learning representations. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158048 https://hdl.handle.net/10356/158048 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Yu, Shuaiqi
Clustering together with learning representations
description Document clustering is a useful and practical machine learning methodology, with various real-world applications, such as search optimization, document recommendation, and tag generation of papers and records. It realizes the process of arranging a batch of pdf documents into many separate subgroups. To achieve more efficient clustering, we introduce representation learning, which is an unsupervised learning approach that self-studies the features from unlabeled data. In this project, we aim at implementing and studying a series of representation learning methods which are more suitable for clustering tasks on web documents such as Reuters-10k dataset. Specifically, the deep fuzzy clustering GrDNFCS has been implemented and explored to reproduce automatically categorize web documents reported in the paper. A new approach named CLDFC, where a contrastive loss is introduced into GrDNFCS is proposed and designed to improve accuracy of clustering. Based on our preliminary study, CLDEC shows 2.5% improvement in accuracy and reduce time complexity of average 60s per epoch compared with GrDNFCS. Experiments on several other clustering models will be included for comparisons.
author2 Lihui Chen
author_facet Lihui Chen
Yu, Shuaiqi
format Final Year Project
author Yu, Shuaiqi
author_sort Yu, Shuaiqi
title Clustering together with learning representations
title_short Clustering together with learning representations
title_full Clustering together with learning representations
title_fullStr Clustering together with learning representations
title_full_unstemmed Clustering together with learning representations
title_sort clustering together with learning representations
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/158048
_version_ 1772828116829339648