Clustering classes in packages for program comprehension

During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for compr...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Xiaobing, LIU, Xiangyue, LI, Bin, LI, Bixin, LO, David, LIAO, Lingzhi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3801
https://ink.library.smu.edu.sg/context/sis_research/article/4803/viewcontent/3787053.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4803
record_format dspace
spelling sg-smu-ink.sis_research-48032019-07-23T02:04:24Z Clustering classes in packages for program comprehension SUN, Xiaobing LIU, Xiangyue LI, Bin LI, Bixin LO, David LIAO, Lingzhi During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension. 2017-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3801 info:doi/10.1155/2017/3787053 https://ink.library.smu.edu.sg/context/sis_research/article/4803/viewcontent/3787053.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Based clustering Empirical studies Latent dirichlet allocations Latent Semantic Indexing Probabilistic latent semantic analysis Program comprehension Software maintenance and evolution Software project Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Based clustering
Empirical studies
Latent dirichlet allocations
Latent Semantic Indexing
Probabilistic latent semantic analysis
Program comprehension
Software maintenance and evolution
Software project
Programming Languages and Compilers
Software Engineering
spellingShingle Based clustering
Empirical studies
Latent dirichlet allocations
Latent Semantic Indexing
Probabilistic latent semantic analysis
Program comprehension
Software maintenance and evolution
Software project
Programming Languages and Compilers
Software Engineering
SUN, Xiaobing
LIU, Xiangyue
LI, Bin
LI, Bixin
LO, David
LIAO, Lingzhi
Clustering classes in packages for program comprehension
description During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.
format text
author SUN, Xiaobing
LIU, Xiangyue
LI, Bin
LI, Bixin
LO, David
LIAO, Lingzhi
author_facet SUN, Xiaobing
LIU, Xiangyue
LI, Bin
LI, Bixin
LO, David
LIAO, Lingzhi
author_sort SUN, Xiaobing
title Clustering classes in packages for program comprehension
title_short Clustering classes in packages for program comprehension
title_full Clustering classes in packages for program comprehension
title_fullStr Clustering classes in packages for program comprehension
title_full_unstemmed Clustering classes in packages for program comprehension
title_sort clustering classes in packages for program comprehension
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3801
https://ink.library.smu.edu.sg/context/sis_research/article/4803/viewcontent/3787053.pdf
_version_ 1770573764003627008