Post2Vec: Learning distributed representations of stack overflow posts

Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vecto...

Full description

Saved in:
Bibliographic Details
Main Authors: XU, Bowen, HOANG, Thong, SHARMA, Abhishek, YANG, Chengran, XIA, Xin, LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7638
https://ink.library.smu.edu.sg/context/sis_research/article/8641/viewcontent/TSE21_Post2Vec_preprint.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8641
record_format dspace
spelling sg-smu-ink.sis_research-86412023-01-10T03:54:44Z Post2Vec: Learning distributed representations of stack overflow posts XU, Bowen HOANG, Thong SHARMA, Abhishek YANG, Chengran XIA, Xin LO, David Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec’s deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25% improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10%, 7%, and 10% in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec. 2022-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7638 info:doi/10.1109/TSE.2021.3093761 https://ink.library.smu.edu.sg/context/sis_research/article/8641/viewcontent/TSE21_Post2Vec_preprint.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning Artificial Intelligence Recommender Systems Software Engineering Vectors Distributed Representations Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Deep Learning Artificial Intelligence
Recommender Systems
Software Engineering
Vectors
Distributed Representations
Software Engineering
spellingShingle Deep Learning Artificial Intelligence
Recommender Systems
Software Engineering
Vectors
Distributed Representations
Software Engineering
XU, Bowen
HOANG, Thong
SHARMA, Abhishek
YANG, Chengran
XIA, Xin
LO, David
Post2Vec: Learning distributed representations of stack overflow posts
description Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec’s deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25% improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10%, 7%, and 10% in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec.
format text
author XU, Bowen
HOANG, Thong
SHARMA, Abhishek
YANG, Chengran
XIA, Xin
LO, David
author_facet XU, Bowen
HOANG, Thong
SHARMA, Abhishek
YANG, Chengran
XIA, Xin
LO, David
author_sort XU, Bowen
title Post2Vec: Learning distributed representations of stack overflow posts
title_short Post2Vec: Learning distributed representations of stack overflow posts
title_full Post2Vec: Learning distributed representations of stack overflow posts
title_fullStr Post2Vec: Learning distributed representations of stack overflow posts
title_full_unstemmed Post2Vec: Learning distributed representations of stack overflow posts
title_sort post2vec: learning distributed representations of stack overflow posts
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7638
https://ink.library.smu.edu.sg/context/sis_research/article/8641/viewcontent/TSE21_Post2Vec_preprint.pdf
_version_ 1770576407449042944