Challenges in analyzing software documentation in Portuguese
Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural lan...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2015
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8943 https://ink.library.smu.edu.sg/context/sis_research/article/9946/viewcontent/sbes15.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9946 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-99462024-07-04T08:44:21Z Challenges in analyzing software documentation in Portuguese TREUDE, Christoph PROLO, Carlos A. FIGUEIRA FILHO, Fernando Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese. 2015-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8943 info:doi/10.1109/SBES.2015.27 https://ink.library.smu.edu.sg/context/sis_research/article/9946/viewcontent/sbes15.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Documentation natural language processing Programming Languages and Compilers Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Documentation natural language processing Programming Languages and Compilers Software Engineering |
spellingShingle |
Documentation natural language processing Programming Languages and Compilers Software Engineering TREUDE, Christoph PROLO, Carlos A. FIGUEIRA FILHO, Fernando Challenges in analyzing software documentation in Portuguese |
description |
Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese. |
format |
text |
author |
TREUDE, Christoph PROLO, Carlos A. FIGUEIRA FILHO, Fernando |
author_facet |
TREUDE, Christoph PROLO, Carlos A. FIGUEIRA FILHO, Fernando |
author_sort |
TREUDE, Christoph |
title |
Challenges in analyzing software documentation in Portuguese |
title_short |
Challenges in analyzing software documentation in Portuguese |
title_full |
Challenges in analyzing software documentation in Portuguese |
title_fullStr |
Challenges in analyzing software documentation in Portuguese |
title_full_unstemmed |
Challenges in analyzing software documentation in Portuguese |
title_sort |
challenges in analyzing software documentation in portuguese |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2015 |
url |
https://ink.library.smu.edu.sg/sis_research/8943 https://ink.library.smu.edu.sg/context/sis_research/article/9946/viewcontent/sbes15.pdf |
_version_ |
1814047653786288128 |