A spark-based parallel fuzzy C median algorithm for web log big data

Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering...

Full description

Saved in:
Bibliographic Details
Main Authors: Mallik, Moksud Alam, Zulkurnain, Nurul Fariza, Nizamuddin, Mohammed Khaja, Sarkar, Rashal, Chalil, Aboosalih Kakkat
Format: Article
Language:English
English
Published: International Organization of IOTPE 2022
Subjects:
Online Access:http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf
http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf
http://irep.iium.edu.my/102189/
https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Islam Antarabangsa Malaysia
Language: English
English
id my.iium.irep.102189
record_format dspace
spelling my.iium.irep.1021892022-12-27T02:06:59Z http://irep.iium.edu.my/102189/ A spark-based parallel fuzzy C median algorithm for web log big data Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashal Chalil, Aboosalih Kakkat TK7885 Computer engineering Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering technique is one of the most crucial tools for collecting useful data from the web. Clustering techniques for small datasets have led to the development of numerous successful clustering techniques. Nevertheless, these techniques do not provide adequate results when trading with extensive data sets. The most important problems are excessive computational difficulty and lengthy evaluating time, which is not acceptable for real-time context. It is very prime to process this enormous information on time. This paper proposes an efficient parallel Fuzzy C median solution based on Spark for large-scale web log data. Based on the Rand Index and SSE (sum of squared error), the parallel Fuzzy C median algorithm's performance is evaluated in the PySpark platform. According to the experimental findings, the parallel Fuzzy C median method built on Spark performs better. International Organization of IOTPE 2022-09 Article PeerReviewed application/pdf en http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf application/pdf en http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf Mallik, Moksud Alam and Zulkurnain, Nurul Fariza and Nizamuddin, Mohammed Khaja and Sarkar, Rashal and Chalil, Aboosalih Kakkat (2022) A spark-based parallel fuzzy C median algorithm for web log big data. International Journal on “Technical and Physical Problems of Engineering” (IJTPE), 14 (3). pp. 212-220. ISSN 2077-3528 https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf
institution Universiti Islam Antarabangsa Malaysia
building IIUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider International Islamic University Malaysia
content_source IIUM Repository (IREP)
url_provider http://irep.iium.edu.my/
language English
English
topic TK7885 Computer engineering
spellingShingle TK7885 Computer engineering
Mallik, Moksud Alam
Zulkurnain, Nurul Fariza
Nizamuddin, Mohammed Khaja
Sarkar, Rashal
Chalil, Aboosalih Kakkat
A spark-based parallel fuzzy C median algorithm for web log big data
description Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering technique is one of the most crucial tools for collecting useful data from the web. Clustering techniques for small datasets have led to the development of numerous successful clustering techniques. Nevertheless, these techniques do not provide adequate results when trading with extensive data sets. The most important problems are excessive computational difficulty and lengthy evaluating time, which is not acceptable for real-time context. It is very prime to process this enormous information on time. This paper proposes an efficient parallel Fuzzy C median solution based on Spark for large-scale web log data. Based on the Rand Index and SSE (sum of squared error), the parallel Fuzzy C median algorithm's performance is evaluated in the PySpark platform. According to the experimental findings, the parallel Fuzzy C median method built on Spark performs better.
format Article
author Mallik, Moksud Alam
Zulkurnain, Nurul Fariza
Nizamuddin, Mohammed Khaja
Sarkar, Rashal
Chalil, Aboosalih Kakkat
author_facet Mallik, Moksud Alam
Zulkurnain, Nurul Fariza
Nizamuddin, Mohammed Khaja
Sarkar, Rashal
Chalil, Aboosalih Kakkat
author_sort Mallik, Moksud Alam
title A spark-based parallel fuzzy C median algorithm for web log big data
title_short A spark-based parallel fuzzy C median algorithm for web log big data
title_full A spark-based parallel fuzzy C median algorithm for web log big data
title_fullStr A spark-based parallel fuzzy C median algorithm for web log big data
title_full_unstemmed A spark-based parallel fuzzy C median algorithm for web log big data
title_sort spark-based parallel fuzzy c median algorithm for web log big data
publisher International Organization of IOTPE
publishDate 2022
url http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf
http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf
http://irep.iium.edu.my/102189/
https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf
_version_ 1753788203347738624