Fusing multi-abstraction vector space models for concern localization

Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many inform...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Yun, LO, David, XIA, Xin, SCANNIELLO, Giuseppe, LE, Tien-Duy B., SUN, Jianling
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4126
https://ink.library.smu.edu.sg/context/sis_research/article/5129/viewcontent/Fusing_Multi_Abstraction_EMSE_afv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5129
record_format dspace
spelling sg-smu-ink.sis_research-51292019-06-12T03:30:15Z Fusing multi-abstraction vector space models for concern localization ZHANG, Yun LO, David XIA, Xin SCANNIELLO, Giuseppe LE, Tien-Duy B. SUN, Jianling Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose a multi-abstraction concern localization technique named MULAB. MULAB represents a code unit and a textual description at multiple abstraction levels. Similarity of a textual description and a code unit is now made by considering all these abstraction levels. We combine a vector space model (VSM) and multiple topic models to compute the similarity and apply a genetic algorithm to infer semi-optimal topic model configurations. We also propose 12 variants of MULAB by using different data fusion methods. We have evaluated our solution on 175 concerns from 9 open source Java software systems. The experimental results show that variant COMBMNZ-DEF performs better than other variants, and also outperforms the state-of-art baseline called PR (PageRank based algorithm), which is proposed by Scanniello et al. (Empir Softw Eng 20(6): 1666-1720 2015) in terms of effectiveness and rank. 2018-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4126 info:doi/10.1007/s10664-017-9585-2 https://ink.library.smu.edu.sg/context/sis_research/article/5129/viewcontent/Fusing_Multi_Abstraction_EMSE_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Concern localization Multi-Abstraction Text retrieval Topic modeling Data fusion Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Concern localization
Multi-Abstraction
Text retrieval
Topic modeling
Data fusion
Software Engineering
spellingShingle Concern localization
Multi-Abstraction
Text retrieval
Topic modeling
Data fusion
Software Engineering
ZHANG, Yun
LO, David
XIA, Xin
SCANNIELLO, Giuseppe
LE, Tien-Duy B.
SUN, Jianling
Fusing multi-abstraction vector space models for concern localization
description Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose a multi-abstraction concern localization technique named MULAB. MULAB represents a code unit and a textual description at multiple abstraction levels. Similarity of a textual description and a code unit is now made by considering all these abstraction levels. We combine a vector space model (VSM) and multiple topic models to compute the similarity and apply a genetic algorithm to infer semi-optimal topic model configurations. We also propose 12 variants of MULAB by using different data fusion methods. We have evaluated our solution on 175 concerns from 9 open source Java software systems. The experimental results show that variant COMBMNZ-DEF performs better than other variants, and also outperforms the state-of-art baseline called PR (PageRank based algorithm), which is proposed by Scanniello et al. (Empir Softw Eng 20(6): 1666-1720 2015) in terms of effectiveness and rank.
format text
author ZHANG, Yun
LO, David
XIA, Xin
SCANNIELLO, Giuseppe
LE, Tien-Duy B.
SUN, Jianling
author_facet ZHANG, Yun
LO, David
XIA, Xin
SCANNIELLO, Giuseppe
LE, Tien-Duy B.
SUN, Jianling
author_sort ZHANG, Yun
title Fusing multi-abstraction vector space models for concern localization
title_short Fusing multi-abstraction vector space models for concern localization
title_full Fusing multi-abstraction vector space models for concern localization
title_fullStr Fusing multi-abstraction vector space models for concern localization
title_full_unstemmed Fusing multi-abstraction vector space models for concern localization
title_sort fusing multi-abstraction vector space models for concern localization
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/4126
https://ink.library.smu.edu.sg/context/sis_research/article/5129/viewcontent/Fusing_Multi_Abstraction_EMSE_afv.pdf
_version_ 1770574344271953920