A spark-based parallel fuzzy C median algorithm for web log big data
Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
International Organization of IOTPE
2022
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf http://irep.iium.edu.my/102189/ https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Islam Antarabangsa Malaysia |
Language: | English English |
id |
my.iium.irep.102189 |
---|---|
record_format |
dspace |
spelling |
my.iium.irep.1021892022-12-27T02:06:59Z http://irep.iium.edu.my/102189/ A spark-based parallel fuzzy C median algorithm for web log big data Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashal Chalil, Aboosalih Kakkat TK7885 Computer engineering Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering technique is one of the most crucial tools for collecting useful data from the web. Clustering techniques for small datasets have led to the development of numerous successful clustering techniques. Nevertheless, these techniques do not provide adequate results when trading with extensive data sets. The most important problems are excessive computational difficulty and lengthy evaluating time, which is not acceptable for real-time context. It is very prime to process this enormous information on time. This paper proposes an efficient parallel Fuzzy C median solution based on Spark for large-scale web log data. Based on the Rand Index and SSE (sum of squared error), the parallel Fuzzy C median algorithm's performance is evaluated in the PySpark platform. According to the experimental findings, the parallel Fuzzy C median method built on Spark performs better. International Organization of IOTPE 2022-09 Article PeerReviewed application/pdf en http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf application/pdf en http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf Mallik, Moksud Alam and Zulkurnain, Nurul Fariza and Nizamuddin, Mohammed Khaja and Sarkar, Rashal and Chalil, Aboosalih Kakkat (2022) A spark-based parallel fuzzy C median algorithm for web log big data. International Journal on “Technical and Physical Problems of Engineering” (IJTPE), 14 (3). pp. 212-220. ISSN 2077-3528 https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf |
institution |
Universiti Islam Antarabangsa Malaysia |
building |
IIUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
International Islamic University Malaysia |
content_source |
IIUM Repository (IREP) |
url_provider |
http://irep.iium.edu.my/ |
language |
English English |
topic |
TK7885 Computer engineering |
spellingShingle |
TK7885 Computer engineering Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashal Chalil, Aboosalih Kakkat A spark-based parallel fuzzy C median algorithm for web log big data |
description |
Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering technique is one of the most crucial tools for collecting useful data from the web. Clustering techniques for small datasets have led to the development of numerous successful clustering techniques. Nevertheless, these techniques do not provide adequate results when trading with extensive data sets. The most important problems are excessive computational difficulty and lengthy evaluating time, which is not acceptable for real-time context. It is very
prime to process this enormous information on time. This
paper proposes an efficient parallel Fuzzy C median
solution based on Spark for large-scale web log data.
Based on the Rand Index and SSE (sum of squared error),
the parallel Fuzzy C median algorithm's performance is
evaluated in the PySpark platform. According to the
experimental findings, the parallel Fuzzy C median
method built on Spark performs better. |
format |
Article |
author |
Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashal Chalil, Aboosalih Kakkat |
author_facet |
Mallik, Moksud Alam Zulkurnain, Nurul Fariza Nizamuddin, Mohammed Khaja Sarkar, Rashal Chalil, Aboosalih Kakkat |
author_sort |
Mallik, Moksud Alam |
title |
A spark-based parallel fuzzy C median algorithm for web log big data |
title_short |
A spark-based parallel fuzzy C median algorithm for web log big data |
title_full |
A spark-based parallel fuzzy C median algorithm for web log big data |
title_fullStr |
A spark-based parallel fuzzy C median algorithm for web log big data |
title_full_unstemmed |
A spark-based parallel fuzzy C median algorithm for web log big data |
title_sort |
spark-based parallel fuzzy c median algorithm for web log big data |
publisher |
International Organization of IOTPE |
publishDate |
2022 |
url |
http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf http://irep.iium.edu.my/102189/ https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf |
_version_ |
1753788203347738624 |