PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models

Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents on Stack Overflow and are vital to support various site operations, such as que...

Full description

Saved in:

Bibliographic Details
Main Authors:	HE, Junda, XU, Bowen, YANG, Zhou, HAN, DongGyun, YANG, Chengran, LIU, Jiakun, ZHAO, Zhipeng, David LO
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2025
Subjects:	Pre-trained models Stack overflow Tag recommendation Transformer Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9846 https://ink.library.smu.edu.sg/context/sis_research/article/10846/viewcontent/PTM4Tag__sv.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10846
record_format	dspace
spelling	sg-smu-ink.sis_research-108462024-12-24T03:25:28Z PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models HE, Junda XU, Bowen YANG, Zhou HAN, DongGyun YANG, Chengran LIU, Jiakun ZHAO, Zhipeng David LO, Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents on Stack Overflow and are vital to support various site operations, such as querying relevant content. Poorly chosen tags often lead to issues such as tag ambiguity and tag explosion. Therefore, a precise and accurate automated tag recommendation technique is needed. Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilize PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5). Our results show that leveraging CodeT5 under the PTM4Tag+ framework achieves the best performance among the eight considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a substantial margin in terms of average Precision@k, Recall@k, and F1-score@k (k ranges from 1 to 5). Specifically, CodeT5 improves the performance of F1-score@1-5 by 8.8%, 12.4%, 15.3%, 16.4%, and 16.6%, respectively. Moreover, to address the concern with inference latency, we experimented PTM4Tag+ using smaller PTM models (i.e., DistilBERT, DistilRoBERTa, CodeBERT-small, and CodeT5-small). We find that although smaller PTMs cannot outperform larger PTMs, they still maintain over 93.96% of the performance on average while reducing the mean inference time by more than 47.2%. 2025-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9846 info:doi/10.1007/s10664-024-10576-z https://ink.library.smu.edu.sg/context/sis_research/article/10846/viewcontent/PTM4Tag__sv.pdf http://creativecommons.org/licenses/by/3.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Pre-trained models Stack overflow Tag recommendation Transformer Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Pre-trained models Stack overflow Tag recommendation Transformer Software Engineering
spellingShingle	Pre-trained models Stack overflow Tag recommendation Transformer Software Engineering HE, Junda XU, Bowen YANG, Zhou HAN, DongGyun YANG, Chengran LIU, Jiakun ZHAO, Zhipeng David LO, PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
description	Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents on Stack Overflow and are vital to support various site operations, such as querying relevant content. Poorly chosen tags often lead to issues such as tag ambiguity and tag explosion. Therefore, a precise and accurate automated tag recommendation technique is needed. Inspired by the recent success of pre-trained models (PTMs) in natural language processing (NLP), we present PTM4Tag+, a tag recommendation framework for Stack Overflow posts that utilize PTMs in language modeling. PTM4Tag+ is implemented with a triplet architecture, which considers three key components of a post, i.e., Title, Description, and Code, with independent PTMs. We utilize a number of popular pre-trained models, including BERT-based models (e.g., BERT, RoBERTa, CodeBERT, BERTOverflow, and ALBERT), and encoder-decoder models (e.g., PLBART, CoTexT, and CodeT5). Our results show that leveraging CodeT5 under the PTM4Tag+ framework achieves the best performance among the eight considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a substantial margin in terms of average Precision@k, Recall@k, and F1-score@k (k ranges from 1 to 5). Specifically, CodeT5 improves the performance of F1-score@1-5 by 8.8%, 12.4%, 15.3%, 16.4%, and 16.6%, respectively. Moreover, to address the concern with inference latency, we experimented PTM4Tag+ using smaller PTM models (i.e., DistilBERT, DistilRoBERTa, CodeBERT-small, and CodeT5-small). We find that although smaller PTMs cannot outperform larger PTMs, they still maintain over 93.96% of the performance on average while reducing the mean inference time by more than 47.2%.
format	text
author	HE, Junda XU, Bowen YANG, Zhou HAN, DongGyun YANG, Chengran LIU, Jiakun ZHAO, Zhipeng David LO,
author_facet	HE, Junda XU, Bowen YANG, Zhou HAN, DongGyun YANG, Chengran LIU, Jiakun ZHAO, Zhipeng David LO,
author_sort	HE, Junda
title	PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
title_short	PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
title_full	PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
title_fullStr	PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
title_full_unstemmed	PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models
title_sort	ptm4tag+: tag recommendation of stack overflow posts with pre-trained models
publisher	Institutional Knowledge at Singapore Management University
publishDate	2025
url	https://ink.library.smu.edu.sg/sis_research/9846 https://ink.library.smu.edu.sg/context/sis_research/article/10846/viewcontent/PTM4Tag__sv.pdf
_version_	1821237248422576128

PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models

Similar Items