A spark-based parallel fuzzy C median algorithm for web log big data
Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
International Organization of IOTPE
2022
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/102189/7/102189_A%20spark-based%20parallel%20fuzzy%20C_SCOPUS.pdf http://irep.iium.edu.my/102189/8/102189_A%20spark-based%20parallel%20fuzzy%20C.pdf http://irep.iium.edu.my/102189/ https://www.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue52-Vol14-No3-Sep2022/29-IJTPE-Issue52-Vol14-No3-Sep2022-pp212-220.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Islam Antarabangsa Malaysia |
Language: | English English |
Summary: | Now-a-days, the World Wide Web (WWW) is regarded as an exceptionally large data storehouse. The WWW is becoming more complicated and substantive every day. At the moment, the situation is such that we are starved for knowledge while drowning in data. Due to these factors, the data mining clustering technique is one of the most crucial tools for collecting useful data from the web. Clustering techniques for small datasets have led to the development of numerous successful clustering techniques. Nevertheless, these techniques do not provide adequate results when trading with extensive data sets. The most important problems are excessive computational difficulty and lengthy evaluating time, which is not acceptable for real-time context. It is very
prime to process this enormous information on time. This
paper proposes an efficient parallel Fuzzy C median
solution based on Spark for large-scale web log data.
Based on the Rand Index and SSE (sum of squared error),
the parallel Fuzzy C median algorithm's performance is
evaluated in the PySpark platform. According to the
experimental findings, the parallel Fuzzy C median
method built on Spark performs better. |
---|