Source code classification using latent semantic indexing with structural and frequency term weighting

In recent years, there is an increase in the number of open source software.Hence, the demand for automatic software classification is also increasing.Latent Semantic Indexing (LSI) is an information retrieval approach that is utilized in classifying source code programs. This research proposes a L...

Full description

Saved in:
Bibliographic Details
Main Authors: Yusof, Yuhanis, Alhersh, Taha, Mahmuddin, Massudi, Mohamed Din, Aniza
Format: Article
Language:English
Published: Medwell Publishing 2012
Subjects:
Online Access:http://repo.uum.edu.my/9501/1/2.pdf
http://repo.uum.edu.my/9501/
http://medwelljournals.com/abstract/?doi=rjasci.2012.266.271
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
id my.uum.repo.9501
record_format eprints
spelling my.uum.repo.95012014-03-24T03:12:53Z http://repo.uum.edu.my/9501/ Source code classification using latent semantic indexing with structural and frequency term weighting Yusof, Yuhanis Alhersh, Taha Mahmuddin, Massudi Mohamed Din, Aniza QA76 Computer software In recent years, there is an increase in the number of open source software.Hence, the demand for automatic software classification is also increasing.Latent Semantic Indexing (LSI) is an information retrieval approach that is utilized in classifying source code programs. This research proposes a Latent Semantic Indexing classifier that integrates information structural and frequency of terms in its weighting scheme.The content terms are identified by extracting words in the source code program. Based on the undertaken experiment the LSI classifier is noted to generate a higher precision and recall compared to the C4.5 algorithm. Furthermore,it is also learned that the use of structural information in the weighting scheme contribute to a better classification. Medwell Publishing 2012 Article PeerReviewed application/pdf en http://repo.uum.edu.my/9501/1/2.pdf Yusof, Yuhanis and Alhersh, Taha and Mahmuddin, Massudi and Mohamed Din, Aniza (2012) Source code classification using latent semantic indexing with structural and frequency term weighting. Research Journal of Applied Sciences, 7 (5). pp. 266-271. ISSN 1815-932X http://medwelljournals.com/abstract/?doi=rjasci.2012.266.271
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA76 Computer software
spellingShingle QA76 Computer software
Yusof, Yuhanis
Alhersh, Taha
Mahmuddin, Massudi
Mohamed Din, Aniza
Source code classification using latent semantic indexing with structural and frequency term weighting
description In recent years, there is an increase in the number of open source software.Hence, the demand for automatic software classification is also increasing.Latent Semantic Indexing (LSI) is an information retrieval approach that is utilized in classifying source code programs. This research proposes a Latent Semantic Indexing classifier that integrates information structural and frequency of terms in its weighting scheme.The content terms are identified by extracting words in the source code program. Based on the undertaken experiment the LSI classifier is noted to generate a higher precision and recall compared to the C4.5 algorithm. Furthermore,it is also learned that the use of structural information in the weighting scheme contribute to a better classification.
format Article
author Yusof, Yuhanis
Alhersh, Taha
Mahmuddin, Massudi
Mohamed Din, Aniza
author_facet Yusof, Yuhanis
Alhersh, Taha
Mahmuddin, Massudi
Mohamed Din, Aniza
author_sort Yusof, Yuhanis
title Source code classification using latent semantic indexing with structural and frequency term weighting
title_short Source code classification using latent semantic indexing with structural and frequency term weighting
title_full Source code classification using latent semantic indexing with structural and frequency term weighting
title_fullStr Source code classification using latent semantic indexing with structural and frequency term weighting
title_full_unstemmed Source code classification using latent semantic indexing with structural and frequency term weighting
title_sort source code classification using latent semantic indexing with structural and frequency term weighting
publisher Medwell Publishing
publishDate 2012
url http://repo.uum.edu.my/9501/1/2.pdf
http://repo.uum.edu.my/9501/
http://medwelljournals.com/abstract/?doi=rjasci.2012.266.271
_version_ 1644280124474916864