POSIT: Simultaneously tagging natural and programming languages

Software developers use a mix of source code and natural language text to communicate with each other: Stack Overflow and Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems — traceability, and reu...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	PÂRȚACHI, Profir-Petru, DASH, Santanu, TREUDE, Christoph, BARR, Earl T.
التنسيق:	text
اللغة:	English
منشور في:	Institutional Knowledge at Singapore Management University 2020
الموضوعات:	Code-switching Language identification Mixed-code Part-of-speech tagging Programming Languages and Compilers Software Engineering
الوصول للمادة أونلاين:	https://ink.library.smu.edu.sg/sis_research/8907 https://ink.library.smu.edu.sg/context/sis_research/article/9910/viewcontent/icse20a.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Singapore Management University
اللغة:	English

id	sg-smu-ink.sis_research-9910
record_format	dspace
spelling	sg-smu-ink.sis_research-99102024-06-27T08:10:22Z POSIT: Simultaneously tagging natural and programming languages PÂRȚACHI, Profir-Petru DASH, Santanu TREUDE, Christoph BARR, Earl T. Software developers use a mix of source code and natural language text to communicate with each other: Stack Overflow and Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems — traceability, and reuse via precise extraction of code snippets from mixed text. In this paper, we borrow code-switching techniques from Natural Language Processing and adapt them to apply to mixed text to solve two problems: language identification and token tagging. Our technique, POSIT, simultaneously provides abstract syntax tree tags for source code tokens, part-of-speech tags for natural language words, and predicts the source language of a token in mixed text. To realize POSIT, we trained a biLSTM network with a Conditional Random Field output layer using abstract syntax tree tags from the CLANG compiler and part-of-speech tags from the Standard Stanford part-of-speech tagger. POSIT improves the state-of-the-art on language identification by 10.6% and PoS/AST tagging by 23.7% in accuracy 2020-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8907 info:doi/10.1145/3377811.3380440 https://ink.library.smu.edu.sg/context/sis_research/article/9910/viewcontent/icse20a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code-switching Language identification Mixed-code Part-of-speech tagging Programming Languages and Compilers Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Code-switching Language identification Mixed-code Part-of-speech tagging Programming Languages and Compilers Software Engineering
spellingShingle	Code-switching Language identification Mixed-code Part-of-speech tagging Programming Languages and Compilers Software Engineering PÂRȚACHI, Profir-Petru DASH, Santanu TREUDE, Christoph BARR, Earl T. POSIT: Simultaneously tagging natural and programming languages
description	Software developers use a mix of source code and natural language text to communicate with each other: Stack Overflow and Developer mailing lists abound with this mixed text. Tagging this mixed text is essential for making progress on two seminal software engineering problems — traceability, and reuse via precise extraction of code snippets from mixed text. In this paper, we borrow code-switching techniques from Natural Language Processing and adapt them to apply to mixed text to solve two problems: language identification and token tagging. Our technique, POSIT, simultaneously provides abstract syntax tree tags for source code tokens, part-of-speech tags for natural language words, and predicts the source language of a token in mixed text. To realize POSIT, we trained a biLSTM network with a Conditional Random Field output layer using abstract syntax tree tags from the CLANG compiler and part-of-speech tags from the Standard Stanford part-of-speech tagger. POSIT improves the state-of-the-art on language identification by 10.6% and PoS/AST tagging by 23.7% in accuracy
format	text
author	PÂRȚACHI, Profir-Petru DASH, Santanu TREUDE, Christoph BARR, Earl T.
author_facet	PÂRȚACHI, Profir-Petru DASH, Santanu TREUDE, Christoph BARR, Earl T.
author_sort	PÂRȚACHI, Profir-Petru
title	POSIT: Simultaneously tagging natural and programming languages
title_short	POSIT: Simultaneously tagging natural and programming languages
title_full	POSIT: Simultaneously tagging natural and programming languages
title_fullStr	POSIT: Simultaneously tagging natural and programming languages
title_full_unstemmed	POSIT: Simultaneously tagging natural and programming languages
title_sort	posit: simultaneously tagging natural and programming languages
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/8907 https://ink.library.smu.edu.sg/context/sis_research/article/9910/viewcontent/icse20a.pdf
_version_	1814047627925258240

POSIT: Simultaneously tagging natural and programming languages

مواد مشابهة