AUTOMATIC IDENTIFICATION OF UML USE CASE DIAGRAMS USING IMAGE FEATURE EXTRACTION AND MACHINE LEARNING

Various use case diagrams that are available actually have potential to be used as empirical research in software engineering. However, use case diagrams which are generally in image format make this research harder to be implemented. Therefore, a recognition tool that can identify use case diagram...

Full description

Saved in:
Bibliographic Details
Main Author: WILIUDARSAN (NIM: 13513002), IRENE
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/22602
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Various use case diagrams that are available actually have potential to be used as empirical research in software engineering. However, use case diagrams which are generally in image format make this research harder to be implemented. Therefore, a recognition tool that can identify use case diagram from image is needed. The development of use case diagram identification tool is executed using image feature extraction and machine learning method. <br /> <br /> <br /> <br /> The development process of use case diagrams identification tool in this thesis can be divided in to several phases, i.e. training and testing data gathering, image feature extraction program development, experiment, and testing phase. Training and testing data are gathered from Google Images Search crawling result. Image feature extraction is performed using Canny edge detector for edge detection, Suzuki85 and Ramer-Douglas-Peucker for contour extraction, Progressive Probabilistic Hough Transform for line detection, implementation of shape detection according to each shape characteristics, and ended by feature calculation from analysis result of use case diagrams characteristics. To increase identification accuracy, machine learning is used towards extracted image features. Classification method used are Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression. <br /> <br /> <br /> <br /> Based on experiment result using 10-fold cross validation and testing against collected data, best performance is given by Random Forest algorithm with 15 extracted image features. The resulted model succeeded in identifying use case diagrams with 0.903 precision, 0.892 recall, 0.898 F-measure, and 0.898 accuracy on testing data.