A hybrid method of feature extraction and naive bayes classification for splitting identifiers

Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying speci...

Full description

Saved in:
Bibliographic Details
Main Authors: Alanee, Nahla, Azmi Murad, Masrah Azrifah
Format: Article
Language:English
Published: Asian Research Publication Network 2017
Online Access:http://psasir.upm.edu.my/id/eprint/60666/1/A%20hybrid%20method%20of%20feature%20extraction%20and%20naive%20bayes%20classification%20for%20splitting%20identifiers.pdf
http://psasir.upm.edu.my/id/eprint/60666/
http://www.jatit.org
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.60666
record_format eprints
spelling my.upm.eprints.606662018-05-17T07:52:38Z http://psasir.upm.edu.my/id/eprint/60666/ A hybrid method of feature extraction and naive bayes classification for splitting identifiers Alanee, Nahla Azmi Murad, Masrah Azrifah Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specific portions of the source code. One of the most important information lies on such source code is the identifiers (e.g. Student). Unlike the traditional text processing,the identifiers in the source code is formed as multi-word such as ‘Employee-Name’. Such multi-words are not divided using white space, instead it can be formed using special characters (e.g. Employee_ID), CamelCase (e.g. EmployeeName) or using abbreviations (e.g. EmpNm). This makes the process of extracting such identifiers more challenging. Several approaches have been performed to resolve the problem of splitting multi-word identifiers. However, there is still room for improvement in terms of accuracy. Such improvement can be represented by utilizing more robust features that have the ability to analyses the morphology of identifiers. Therefore, this study aims to propose a hybrid method of feature extraction and Naïve Bayes classifier in order to separate multi-word identifiers within source code. The dataset that has been used in this study is a benchmark-annotated data that contains large number of Java codes. Multiple experiments have been conducted in order to evaluate the proposed features independently and with combinations. Results shown that the combination of all features have obtained the best accuracy by achieving 64.7% of f-measure. Such finding implies the usefulness of the proposed features in terms of discriminating multi-word identifiers. Asian Research Publication Network 2017 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/60666/1/A%20hybrid%20method%20of%20feature%20extraction%20and%20naive%20bayes%20classification%20for%20splitting%20identifiers.pdf Alanee, Nahla and Azmi Murad, Masrah Azrifah (2017) A hybrid method of feature extraction and naive bayes classification for splitting identifiers. Journal of Theoretical and Applied Information Technology, 95 (7). 1549 - 1557. ISSN 1992-8645; ESSN: 1817-3195 http://www.jatit.org
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Nowadays, integrating natural language processing techniques on software systems has caught many researchers’ attentions. Such integration can be represented by analyzing the morphology of the source code in order to gain meaningful information. Feature location is the process of identifying specific portions of the source code. One of the most important information lies on such source code is the identifiers (e.g. Student). Unlike the traditional text processing,the identifiers in the source code is formed as multi-word such as ‘Employee-Name’. Such multi-words are not divided using white space, instead it can be formed using special characters (e.g. Employee_ID), CamelCase (e.g. EmployeeName) or using abbreviations (e.g. EmpNm). This makes the process of extracting such identifiers more challenging. Several approaches have been performed to resolve the problem of splitting multi-word identifiers. However, there is still room for improvement in terms of accuracy. Such improvement can be represented by utilizing more robust features that have the ability to analyses the morphology of identifiers. Therefore, this study aims to propose a hybrid method of feature extraction and Naïve Bayes classifier in order to separate multi-word identifiers within source code. The dataset that has been used in this study is a benchmark-annotated data that contains large number of Java codes. Multiple experiments have been conducted in order to evaluate the proposed features independently and with combinations. Results shown that the combination of all features have obtained the best accuracy by achieving 64.7% of f-measure. Such finding implies the usefulness of the proposed features in terms of discriminating multi-word identifiers.
format Article
author Alanee, Nahla
Azmi Murad, Masrah Azrifah
spellingShingle Alanee, Nahla
Azmi Murad, Masrah Azrifah
A hybrid method of feature extraction and naive bayes classification for splitting identifiers
author_facet Alanee, Nahla
Azmi Murad, Masrah Azrifah
author_sort Alanee, Nahla
title A hybrid method of feature extraction and naive bayes classification for splitting identifiers
title_short A hybrid method of feature extraction and naive bayes classification for splitting identifiers
title_full A hybrid method of feature extraction and naive bayes classification for splitting identifiers
title_fullStr A hybrid method of feature extraction and naive bayes classification for splitting identifiers
title_full_unstemmed A hybrid method of feature extraction and naive bayes classification for splitting identifiers
title_sort hybrid method of feature extraction and naive bayes classification for splitting identifiers
publisher Asian Research Publication Network
publishDate 2017
url http://psasir.upm.edu.my/id/eprint/60666/1/A%20hybrid%20method%20of%20feature%20extraction%20and%20naive%20bayes%20classification%20for%20splitting%20identifiers.pdf
http://psasir.upm.edu.my/id/eprint/60666/
http://www.jatit.org
_version_ 1643837413357780992