An efficient parallel clustering algorithm on big data using Spark

Clustering is a useful tool for dealing with large amounts of data. When dealing with larger datasets, typical algorithms become inefficient. The main reason for this is that most algorithms do not support large data sets or dimensionality. Furthermore, they are only capable of handling organized d...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mallik, Moksud Alam, Zulkurnain, Nurul Fariza, Nizamuddin, Mohammed Khaja, Sarkar, Rashel, Ahmed, S K Jamil
Format:	Article
Language:	English
Published:	East China University of Science and Technology 2022
Subjects:	TK7885 Computer engineering
Online Access:	http://irep.iium.edu.my/99923/1/99923_An%20efficient%20parallel%20clustering%20algorithm.pdf http://irep.iium.edu.my/99923/ http://hdlgdxxb.info/index.php/JE_CUST/article/view/106
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Islam Antarabangsa Malaysia
Language:	English

id	my.iium.irep.99923
record_format	dspace
spelling	my.iium.irep.999232022-09-15T05:40:20Z http://irep.iium.edu.my/99923/ An efficient parallel clustering algorithm on big data using Spark Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashel Ahmed, S K Jamil TK7885 Computer engineering Clustering is a useful tool for dealing with large amounts of data. When dealing with larger datasets, typical algorithms become inefficient. The main reason for this is that most algorithms do not support large data sets or dimensionality. Furthermore, they are only capable of handling organized data. Every second, data from numerous streams such as log files, social media, and YouTube is poured in. Because of the increasing number and variety of data on the internet, we need to refine a parallel clustering algorithm that is both efficient and effective for Big Data. There are mainly two frameworks to process big data: MapReduce and Spark. Spark is the future of the big data platform. It is 100 times faster than Map Reduce. Here we are proposing a new parallel fuzzy clustering algorithm called "An efficient parallel clustering algorithm on big data using spark" which deals with real-time processing. Proposed algorithm gives the fast and iterative data processing and eliminates the effect of batch processing. East China University of Science and Technology 2022 Article PeerReviewed application/pdf en http://irep.iium.edu.my/99923/1/99923_An%20efficient%20parallel%20clustering%20algorithm.pdf Mallik, Moksud Alam and Zulkurnain, Nurul Fariza and Nizamuddin, Mohammed Khaja and Sarkar, Rashel and Ahmed, S K Jamil (2022) An efficient parallel clustering algorithm on big data using Spark. Journal of East China University of Science and Technology, 65 (2). pp. 535-547. ISSN 1006-3080 http://hdlgdxxb.info/index.php/JE_CUST/article/view/106 10.5281/ZENODO.6730602
institution	Universiti Islam Antarabangsa Malaysia
building	IIUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	International Islamic University Malaysia
content_source	IIUM Repository (IREP)
url_provider	http://irep.iium.edu.my/
language	English
topic	TK7885 Computer engineering
spellingShingle	TK7885 Computer engineering Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashel Ahmed, S K Jamil An efficient parallel clustering algorithm on big data using Spark
description	Clustering is a useful tool for dealing with large amounts of data. When dealing with larger datasets, typical algorithms become inefficient. The main reason for this is that most algorithms do not support large data sets or dimensionality. Furthermore, they are only capable of handling organized data. Every second, data from numerous streams such as log files, social media, and YouTube is poured in. Because of the increasing number and variety of data on the internet, we need to refine a parallel clustering algorithm that is both efficient and effective for Big Data. There are mainly two frameworks to process big data: MapReduce and Spark. Spark is the future of the big data platform. It is 100 times faster than Map Reduce. Here we are proposing a new parallel fuzzy clustering algorithm called "An efficient parallel clustering algorithm on big data using spark" which deals with real-time processing. Proposed algorithm gives the fast and iterative data processing and eliminates the effect of batch processing.
format	Article
author	Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashel Ahmed, S K Jamil
author_facet	Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashel Ahmed, S K Jamil
author_sort	Mallik, Moksud Alam
title	An efficient parallel clustering algorithm on big data using Spark
title_short	An efficient parallel clustering algorithm on big data using Spark
title_full	An efficient parallel clustering algorithm on big data using Spark
title_fullStr	An efficient parallel clustering algorithm on big data using Spark
title_full_unstemmed	An efficient parallel clustering algorithm on big data using Spark
title_sort	efficient parallel clustering algorithm on big data using spark
publisher	East China University of Science and Technology
publishDate	2022
url	http://irep.iium.edu.my/99923/1/99923_An%20efficient%20parallel%20clustering%20algorithm.pdf http://irep.iium.edu.my/99923/ http://hdlgdxxb.info/index.php/JE_CUST/article/view/106
_version_	1744353533624320000

An efficient parallel clustering algorithm on big data using Spark

Similar Items