Web proxy log classification for burst behavior
© 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Published: |
2018
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/42440 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
id |
th-mahidol.42440 |
---|---|
record_format |
dspace |
spelling |
th-mahidol.424402019-03-14T15:03:29Z Web proxy log classification for burst behavior Nattapol Kiatkumjounwong Sudsanguan Ngamsuriyaroj Anon Plangprasopchok Mahidol University Thailand National Electronics and Computer Technology Center Computer Science Engineering © 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy logs into normal, non-burst and burst. To filter out normal logs, we use Apriori algorithm in Weka mining tool to detect the outlier based on the duration and the bandwidth of logs for file categories. Burst logs are separated out from outlier logs using the threshold rates computed for file types. The experimental results show the majority of about 80% for normal logs, and burst logs count for about 2% which should be further investigated for unusual behavior. Since the number of logs kept on storage would be very huge, it would take a long time to process them timely. Thus, we measure the performance of parallel log processing on a Hadoop system with varying data size and the number of nodes. We found that the speedup of log processing is well corresponded to the increasing workload, and it would be convincing to process logs in real time. 2018-12-21T07:22:37Z 2019-03-14T08:03:29Z 2018-12-21T07:22:37Z 2019-03-14T08:03:29Z 2017-02-08 Conference Paper IEEE Region 10 Annual International Conference, Proceedings/TENCON. (2017), 472-477 10.1109/TENCON.2016.7848044 21593450 21593442 2-s2.0-85015402538 https://repository.li.mahidol.ac.th/handle/123456789/42440 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85015402538&origin=inward |
institution |
Mahidol University |
building |
Mahidol University Library |
continent |
Asia |
country |
Thailand Thailand |
content_provider |
Mahidol University Library |
collection |
Mahidol University Institutional Repository |
topic |
Computer Science Engineering |
spellingShingle |
Computer Science Engineering Nattapol Kiatkumjounwong Sudsanguan Ngamsuriyaroj Anon Plangprasopchok Web proxy log classification for burst behavior |
description |
© 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy logs into normal, non-burst and burst. To filter out normal logs, we use Apriori algorithm in Weka mining tool to detect the outlier based on the duration and the bandwidth of logs for file categories. Burst logs are separated out from outlier logs using the threshold rates computed for file types. The experimental results show the majority of about 80% for normal logs, and burst logs count for about 2% which should be further investigated for unusual behavior. Since the number of logs kept on storage would be very huge, it would take a long time to process them timely. Thus, we measure the performance of parallel log processing on a Hadoop system with varying data size and the number of nodes. We found that the speedup of log processing is well corresponded to the increasing workload, and it would be convincing to process logs in real time. |
author2 |
Mahidol University |
author_facet |
Mahidol University Nattapol Kiatkumjounwong Sudsanguan Ngamsuriyaroj Anon Plangprasopchok |
format |
Conference or Workshop Item |
author |
Nattapol Kiatkumjounwong Sudsanguan Ngamsuriyaroj Anon Plangprasopchok |
author_sort |
Nattapol Kiatkumjounwong |
title |
Web proxy log classification for burst behavior |
title_short |
Web proxy log classification for burst behavior |
title_full |
Web proxy log classification for burst behavior |
title_fullStr |
Web proxy log classification for burst behavior |
title_full_unstemmed |
Web proxy log classification for burst behavior |
title_sort |
web proxy log classification for burst behavior |
publishDate |
2018 |
url |
https://repository.li.mahidol.ac.th/handle/123456789/42440 |
_version_ |
1763492255007506432 |