Which variables should I log?

Developers usually depend on inserting logging statements into the source code to collect system runtime information. Such logged information is valuable for software maintenance. A logging statement usually prints one or more variables to record vital system status. However, due to the lack of rigo...

Full description

Saved in:

Bibliographic Details
Main Authors:	LIU, Zhongxin, XIA, Xin, LO, David, XING, Zhenchang, HASSAN, Ahmed E., LI, Shanping
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Log Logging Variable Word Embedding Representation Learning Data Storage Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4495 https://ink.library.smu.edu.sg/context/sis_research/article/5498/viewcontent/tse197.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5498
record_format	dspace
spelling	sg-smu-ink.sis_research-54982022-07-26T07:39:13Z Which variables should I log? LIU, Zhongxin XIA, Xin LO, David XING, Zhenchang HASSAN, Ahmed E. LI, Shanping Developers usually depend on inserting logging statements into the source code to collect system runtime information. Such logged information is valuable for software maintenance. A logging statement usually prints one or more variables to record vital system status. However, due to the lack of rigorous logging guidance and the requirement of domain-specific knowledge, it is not easy for developers to make proper decisions about which variables to log. To address this need, in this work, we propose an approach to recommend logging variables for developers during development by learning from existing logging statements. Different from other prediction tasks in software engineering, this task has two challenges: 1) Dynamic labels – different logging statements have different sets of accessible variables, which means in this task, the set of possible labels of each sample is not the same. 2) Out-of-vocabulary words – identifiers’ names are not limited to natural language words and the test set usually contains a number of program tokens which are out of the vocabulary built from the training set and cannot be appropriately mapped to word embeddings. To deal with the first challenge, we convert this task into a representation learning problem instead of a multi-label classification problem. Given a code snippet which lacks a logging statement, our approach first leverages a neural network with an RNN (recurrent neural network) layer and a self-attention layer to learn the proper representation of each program token, and then predicts whether each token should be logged through a unified binary classifier based on the learned representation. To handle the second challenge, we propose a novel method to map program tokens into word embeddings by making use of the pre-trained word embeddings of natural language tokens. We evaluate our approach on 9 large and high-quality Java projects. Our evaluation results show that the average MAP of our approach is over 0.84, outperforming random guess and an information-retrieval-based method by large margins. 2021-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4495 info:doi/10.1109/TSE.2019.2941943 https://ink.library.smu.edu.sg/context/sis_research/article/5498/viewcontent/tse197.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Log Logging Variable Word Embedding Representation Learning Data Storage Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Log Logging Variable Word Embedding Representation Learning Data Storage Systems Software Engineering
spellingShingle	Log Logging Variable Word Embedding Representation Learning Data Storage Systems Software Engineering LIU, Zhongxin XIA, Xin LO, David XING, Zhenchang HASSAN, Ahmed E. LI, Shanping Which variables should I log?
description	Developers usually depend on inserting logging statements into the source code to collect system runtime information. Such logged information is valuable for software maintenance. A logging statement usually prints one or more variables to record vital system status. However, due to the lack of rigorous logging guidance and the requirement of domain-specific knowledge, it is not easy for developers to make proper decisions about which variables to log. To address this need, in this work, we propose an approach to recommend logging variables for developers during development by learning from existing logging statements. Different from other prediction tasks in software engineering, this task has two challenges: 1) Dynamic labels – different logging statements have different sets of accessible variables, which means in this task, the set of possible labels of each sample is not the same. 2) Out-of-vocabulary words – identifiers’ names are not limited to natural language words and the test set usually contains a number of program tokens which are out of the vocabulary built from the training set and cannot be appropriately mapped to word embeddings. To deal with the first challenge, we convert this task into a representation learning problem instead of a multi-label classification problem. Given a code snippet which lacks a logging statement, our approach first leverages a neural network with an RNN (recurrent neural network) layer and a self-attention layer to learn the proper representation of each program token, and then predicts whether each token should be logged through a unified binary classifier based on the learned representation. To handle the second challenge, we propose a novel method to map program tokens into word embeddings by making use of the pre-trained word embeddings of natural language tokens. We evaluate our approach on 9 large and high-quality Java projects. Our evaluation results show that the average MAP of our approach is over 0.84, outperforming random guess and an information-retrieval-based method by large margins.
format	text
author	LIU, Zhongxin XIA, Xin LO, David XING, Zhenchang HASSAN, Ahmed E. LI, Shanping
author_facet	LIU, Zhongxin XIA, Xin LO, David XING, Zhenchang HASSAN, Ahmed E. LI, Shanping
author_sort	LIU, Zhongxin
title	Which variables should I log?
title_short	Which variables should I log?
title_full	Which variables should I log?
title_fullStr	Which variables should I log?
title_full_unstemmed	Which variables should I log?
title_sort	which variables should i log?
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/4495 https://ink.library.smu.edu.sg/context/sis_research/article/5498/viewcontent/tse197.pdf
_version_	1770574875265597440

Which variables should I log?

Similar Items