Web proxy log classification for burst behavior

© 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy...

Full description

Saved in:
Bibliographic Details
Main Authors: Nattapol Kiatkumjounwong, Sudsanguan Ngamsuriyaroj, Anon Plangprasopchok
Other Authors: Mahidol University
Format: Conference or Workshop Item
Published: 2018
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/42440
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.42440
record_format dspace
spelling th-mahidol.424402019-03-14T15:03:29Z Web proxy log classification for burst behavior Nattapol Kiatkumjounwong Sudsanguan Ngamsuriyaroj Anon Plangprasopchok Mahidol University Thailand National Electronics and Computer Technology Center Computer Science Engineering © 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy logs into normal, non-burst and burst. To filter out normal logs, we use Apriori algorithm in Weka mining tool to detect the outlier based on the duration and the bandwidth of logs for file categories. Burst logs are separated out from outlier logs using the threshold rates computed for file types. The experimental results show the majority of about 80% for normal logs, and burst logs count for about 2% which should be further investigated for unusual behavior. Since the number of logs kept on storage would be very huge, it would take a long time to process them timely. Thus, we measure the performance of parallel log processing on a Hadoop system with varying data size and the number of nodes. We found that the speedup of log processing is well corresponded to the increasing workload, and it would be convincing to process logs in real time. 2018-12-21T07:22:37Z 2019-03-14T08:03:29Z 2018-12-21T07:22:37Z 2019-03-14T08:03:29Z 2017-02-08 Conference Paper IEEE Region 10 Annual International Conference, Proceedings/TENCON. (2017), 472-477 10.1109/TENCON.2016.7848044 21593450 21593442 2-s2.0-85015402538 https://repository.li.mahidol.ac.th/handle/123456789/42440 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85015402538&origin=inward
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Computer Science
Engineering
spellingShingle Computer Science
Engineering
Nattapol Kiatkumjounwong
Sudsanguan Ngamsuriyaroj
Anon Plangprasopchok
Web proxy log classification for burst behavior
description © 2016 IEEE. Many organizations and most Internet service providers need to keep the history of web accesses in the form of proxy logs. Such logs would be later used for web usage as well as for investigating abnormal activities including an abuse, a misuse or fraud. This paper classifies web proxy logs into normal, non-burst and burst. To filter out normal logs, we use Apriori algorithm in Weka mining tool to detect the outlier based on the duration and the bandwidth of logs for file categories. Burst logs are separated out from outlier logs using the threshold rates computed for file types. The experimental results show the majority of about 80% for normal logs, and burst logs count for about 2% which should be further investigated for unusual behavior. Since the number of logs kept on storage would be very huge, it would take a long time to process them timely. Thus, we measure the performance of parallel log processing on a Hadoop system with varying data size and the number of nodes. We found that the speedup of log processing is well corresponded to the increasing workload, and it would be convincing to process logs in real time.
author2 Mahidol University
author_facet Mahidol University
Nattapol Kiatkumjounwong
Sudsanguan Ngamsuriyaroj
Anon Plangprasopchok
format Conference or Workshop Item
author Nattapol Kiatkumjounwong
Sudsanguan Ngamsuriyaroj
Anon Plangprasopchok
author_sort Nattapol Kiatkumjounwong
title Web proxy log classification for burst behavior
title_short Web proxy log classification for burst behavior
title_full Web proxy log classification for burst behavior
title_fullStr Web proxy log classification for burst behavior
title_full_unstemmed Web proxy log classification for burst behavior
title_sort web proxy log classification for burst behavior
publishDate 2018
url https://repository.li.mahidol.ac.th/handle/123456789/42440
_version_ 1763492255007506432