Leveraging large language models and BERT for log parsing and anomaly detection

Computer systems and applications generate large amounts of logs to measure and record information, which is vital to protect the systems from malicious attacks and useful for repairing faults, especially with the rapid development of distributed computing. Among various logs, the anomaly log is ben...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhou, Yihan, Chen, Yan, Rao, Xuanming, Zhou, Yukang, Li, Yuxin, Hu, Chao
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181426
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181426
record_format dspace
spelling sg-ntu-dr.10356-1814262024-12-06T15:38:31Z Leveraging large language models and BERT for log parsing and anomaly detection Zhou, Yihan Chen, Yan Rao, Xuanming Zhou, Yukang Li, Yuxin Hu, Chao School of Computer Science and Engineering Computer and Information Science Anomaly log detection Large language models Computer systems and applications generate large amounts of logs to measure and record information, which is vital to protect the systems from malicious attacks and useful for repairing faults, especially with the rapid development of distributed computing. Among various logs, the anomaly log is beneficial for operations and maintenance (O&M) personnel to locate faults and improve efficiency. In this paper, we utilize a large language model, ChatGPT, for the log parser task. We choose the BERT model, a self-supervised framework for log anomaly detection. BERT, an embedded transformer encoder, with a self-attention mechanism can better handle context-dependent tasks such as anomaly log detection. Meanwhile, it is based on the masked language model task and next sentence prediction task in the pretraining period to capture the normal log sequence pattern. The experimental results on two log datasets show that the BERT model combined with an LLM performed better than other classical models such as Deelog and Loganomaly. Published version This research was sponsored in part by the National Natural Science Foundation of China (No. 62177046 and 62477046), Hunan 14th Five-Year Plan Educational Science Research Project (No. XJK23AJD022 and XJK23AJD021), Hunan Social Science Foundation (No. 22YBA012), Hunan Provincial Key Research and Development Project (No. 2021SK2022), and High Performance Computing Center of Central South University. 2024-12-02T04:47:20Z 2024-12-02T04:47:20Z 2024 Journal Article Zhou, Y., Chen, Y., Rao, X., Zhou, Y., Li, Y. & Hu, C. (2024). Leveraging large language models and BERT for log parsing and anomaly detection. Mathematics, 12(17), 12172758-. https://dx.doi.org/10.3390/math12172758 2227-7390 https://hdl.handle.net/10356/181426 10.3390/math12172758 2-s2.0-85203646702 17 12 12172758 en Mathematics © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Anomaly log detection
Large language models
spellingShingle Computer and Information Science
Anomaly log detection
Large language models
Zhou, Yihan
Chen, Yan
Rao, Xuanming
Zhou, Yukang
Li, Yuxin
Hu, Chao
Leveraging large language models and BERT for log parsing and anomaly detection
description Computer systems and applications generate large amounts of logs to measure and record information, which is vital to protect the systems from malicious attacks and useful for repairing faults, especially with the rapid development of distributed computing. Among various logs, the anomaly log is beneficial for operations and maintenance (O&M) personnel to locate faults and improve efficiency. In this paper, we utilize a large language model, ChatGPT, for the log parser task. We choose the BERT model, a self-supervised framework for log anomaly detection. BERT, an embedded transformer encoder, with a self-attention mechanism can better handle context-dependent tasks such as anomaly log detection. Meanwhile, it is based on the masked language model task and next sentence prediction task in the pretraining period to capture the normal log sequence pattern. The experimental results on two log datasets show that the BERT model combined with an LLM performed better than other classical models such as Deelog and Loganomaly.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhou, Yihan
Chen, Yan
Rao, Xuanming
Zhou, Yukang
Li, Yuxin
Hu, Chao
format Article
author Zhou, Yihan
Chen, Yan
Rao, Xuanming
Zhou, Yukang
Li, Yuxin
Hu, Chao
author_sort Zhou, Yihan
title Leveraging large language models and BERT for log parsing and anomaly detection
title_short Leveraging large language models and BERT for log parsing and anomaly detection
title_full Leveraging large language models and BERT for log parsing and anomaly detection
title_fullStr Leveraging large language models and BERT for log parsing and anomaly detection
title_full_unstemmed Leveraging large language models and BERT for log parsing and anomaly detection
title_sort leveraging large language models and bert for log parsing and anomaly detection
publishDate 2024
url https://hdl.handle.net/10356/181426
_version_ 1819113083643101184