It takes two to tango: Deleted Stack Overflow question prediction with text and meta features

Stack Overflow is a popular community-based Q&A website that caters to technical needs of software developers. As of February 2015 - Stack Overflow has more than 3.9M registered users, 8.8M questions, and 41M comments. Stack Overflow provides explicit and detailed guidelines on how to post quest...

Full description

Saved in:
Bibliographic Details
Main Authors: XIA, Xin, David LO, CORREA, Denzil, SUREKA, Ashish, SHIHAB, Emad
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3568
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4569
record_format dspace
spelling sg-smu-ink.sis_research-45692017-04-10T02:12:07Z It takes two to tango: Deleted Stack Overflow question prediction with text and meta features XIA, Xin David LO, CORREA, Denzil SUREKA, Ashish SHIHAB, Emad Stack Overflow is a popular community-based Q&A website that caters to technical needs of software developers. As of February 2015 - Stack Overflow has more than 3.9M registered users, 8.8M questions, and 41M comments. Stack Overflow provides explicit and detailed guidelines on how to post questions but, some questions are very poor in quality. Such questions are deleted by the experienced community members and moderators. Deleted questions increase maintenance cost and have an adverse impact on the user experience. Therefore, predicting deleted questions is an important task. In this study, we propose a two stage hybrid approach - DelPredictor - which combines text processing and classification techniques to predict deleted questions. In the first stage, DelPredictor converts text in the title, body, and tag fields of questions into numerical textual features via text processing and classification techniques. In the second stage, it extracts meta features that can be categorized into: profile, community, content, and syntactic features. Next, it learns and combines two independent classifiers built on the textual and meta features. We evaluate DelPredictor on 5 years (2008 - 2013) of deleted questions from Stack Overflow. Our experimental results show that DelPredictor improves the F1-scores over baseline prediction, a prior approach [12] and a text-based approach by 29.50%, 9.34%, and 28.11%, respectively. 2016-06-10T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/3568 info:doi/10.1109/COMPSAC.2016.145 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Classification Deleted Question Stack Overflow Text Processing Computer Sciences Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Classification
Deleted Question
Stack Overflow
Text Processing
Computer Sciences
Software Engineering
spellingShingle Classification
Deleted Question
Stack Overflow
Text Processing
Computer Sciences
Software Engineering
XIA, Xin
David LO,
CORREA, Denzil
SUREKA, Ashish
SHIHAB, Emad
It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
description Stack Overflow is a popular community-based Q&A website that caters to technical needs of software developers. As of February 2015 - Stack Overflow has more than 3.9M registered users, 8.8M questions, and 41M comments. Stack Overflow provides explicit and detailed guidelines on how to post questions but, some questions are very poor in quality. Such questions are deleted by the experienced community members and moderators. Deleted questions increase maintenance cost and have an adverse impact on the user experience. Therefore, predicting deleted questions is an important task. In this study, we propose a two stage hybrid approach - DelPredictor - which combines text processing and classification techniques to predict deleted questions. In the first stage, DelPredictor converts text in the title, body, and tag fields of questions into numerical textual features via text processing and classification techniques. In the second stage, it extracts meta features that can be categorized into: profile, community, content, and syntactic features. Next, it learns and combines two independent classifiers built on the textual and meta features. We evaluate DelPredictor on 5 years (2008 - 2013) of deleted questions from Stack Overflow. Our experimental results show that DelPredictor improves the F1-scores over baseline prediction, a prior approach [12] and a text-based approach by 29.50%, 9.34%, and 28.11%, respectively.
format text
author XIA, Xin
David LO,
CORREA, Denzil
SUREKA, Ashish
SHIHAB, Emad
author_facet XIA, Xin
David LO,
CORREA, Denzil
SUREKA, Ashish
SHIHAB, Emad
author_sort XIA, Xin
title It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
title_short It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
title_full It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
title_fullStr It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
title_full_unstemmed It takes two to tango: Deleted Stack Overflow question prediction with text and meta features
title_sort it takes two to tango: deleted stack overflow question prediction with text and meta features
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/3568
_version_ 1770573330593611776