Domain-specific cross-language relevant question retrieval

In software development process, developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in their native language (e.g., Chinese), they could translate their query and search on...

Full description

Saved in:
Bibliographic Details
Main Authors: XU, Bowen, XING, Zhenchang, XIA, Xin, David LO, WANG, Qingye, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3562
https://ink.library.smu.edu.sg/context/sis_research/article/4563/viewcontent/Domain_specific_cross_language_relevant_question_retrieval__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4563
record_format dspace
spelling sg-smu-ink.sis_research-45632017-04-10T07:31:56Z Domain-specific cross-language relevant question retrieval XU, Bowen XING, Zhenchang XIA, Xin David LO, WANG, Qingye LI, Shanping In software development process, developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in their native language (e.g., Chinese), they could translate their query and search on the Q&A sites in another language (e.g., English). However, developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the weight of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantage of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated crosslanguage relevant question retrieval (CLRQR) system to retrieve relevant English questions on Stack Overflow for a given Chinese question. Our CLRQR system first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and formulates a query with highest-scored English words for retrieving relevant questions in a repository of 684,599 Java questions in English from Stack Overflow. To evaluate the performance of our proposed approach, we also propose four online retrieval approaches as baselines. We randomly select 80 Java questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. Each approach returns top-10 most relevant questions for a given Chinese question. We invite 5 users to evaluate the relevance of the retrieved English questions. The experiment results show that CLRQR system outperforms the four baseline approaches, and the statistical tests show the improvements are significant. 2016-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3562 info:doi/10.1145/2901739.2901746 https://ink.library.smu.edu.sg/context/sis_research/article/4563/viewcontent/Domain_specific_cross_language_relevant_question_retrieval__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Cross-language question retrieval; Domain-specific translation Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Cross-language question retrieval; Domain-specific translation
Programming Languages and Compilers
spellingShingle Cross-language question retrieval; Domain-specific translation
Programming Languages and Compilers
XU, Bowen
XING, Zhenchang
XIA, Xin
David LO,
WANG, Qingye
LI, Shanping
Domain-specific cross-language relevant question retrieval
description In software development process, developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in their native language (e.g., Chinese), they could translate their query and search on the Q&A sites in another language (e.g., English). However, developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the weight of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantage of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated crosslanguage relevant question retrieval (CLRQR) system to retrieve relevant English questions on Stack Overflow for a given Chinese question. Our CLRQR system first extracts essential information (both Chinese and English) from the title and description of the input Chinese question, then performs domain-specific translation of the essential Chinese information into English, and formulates a query with highest-scored English words for retrieving relevant questions in a repository of 684,599 Java questions in English from Stack Overflow. To evaluate the performance of our proposed approach, we also propose four online retrieval approaches as baselines. We randomly select 80 Java questions in SegmentFault and V2EX (two Chinese Q&A websites for computer programming) as the query Chinese questions. Each approach returns top-10 most relevant questions for a given Chinese question. We invite 5 users to evaluate the relevance of the retrieved English questions. The experiment results show that CLRQR system outperforms the four baseline approaches, and the statistical tests show the improvements are significant.
format text
author XU, Bowen
XING, Zhenchang
XIA, Xin
David LO,
WANG, Qingye
LI, Shanping
author_facet XU, Bowen
XING, Zhenchang
XIA, Xin
David LO,
WANG, Qingye
LI, Shanping
author_sort XU, Bowen
title Domain-specific cross-language relevant question retrieval
title_short Domain-specific cross-language relevant question retrieval
title_full Domain-specific cross-language relevant question retrieval
title_fullStr Domain-specific cross-language relevant question retrieval
title_full_unstemmed Domain-specific cross-language relevant question retrieval
title_sort domain-specific cross-language relevant question retrieval
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/3562
https://ink.library.smu.edu.sg/context/sis_research/article/4563/viewcontent/Domain_specific_cross_language_relevant_question_retrieval__1_.pdf
_version_ 1770573304092950528