DEVELOPMENT OF SEQUENCE CLUSTERING ON PROCESS MINING FOR BUSINESS PROCESS ANALYSIS USING K-MEANS

The process discovery as a major part of the process mining aims to produce a model from an event log. Event logs are a set of activities from business processes that have been executed and recorded in an Information System. The event log is currently used to analyze the current state of a company....

Full description

Saved in:
Bibliographic Details
Main Author: Fitrianti Fahrudin - NIM: 23516007 , Nur
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/29783
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The process discovery as a major part of the process mining aims to produce a model from an event log. Event logs are a set of activities from business processes that have been executed and recorded in an Information System. The event log is currently used to analyze the current state of a company. This is one of the goals of the process mining. However, the application of the process mining in the real world often has problems. Variants of a very large business process make the model produced by this process discovery difficult to understand. To deal with this problem a solution is proposed to partition or divide the event log into groups that have similarities. This method is known as sequence clustering. Sequence clustering is an additional process that is carried out before the process discovery is carried out. The implementation of sequence clustering is proven to be able to present the model produced by this process discovery to be simpler. <br /> <br /> <br /> <br /> In the previous research, First Order Markov Chain was used as a method for clustering. Each Cluster is represented by the transition matrix. Because the previous data cluster was not yet known, the researchers used the Expectation Maximization method to determine the transition matrix for each cluster. Each sequence is mapped into a cluster based on the highest probability value. However, after testing the clustering results, it was found that the fitness and precision values of the resulting process models often decreased, when compared to the process model that came from the event log that was not through the clustering process. Therefore this thesis developed a sequence clustering methodology that can improve fitness and precision values. <br /> <br /> <br /> <br /> The K-Means method is chosen as the method used to cluster. The application of K-Means in sequence clustering is able to increase the fitness and precision values of a model that results from the process discovery stage. However, determining the optimal number of clusters is important to note. Wrong in determining the number of clusters, can result in a decrease in the fitness value and precision of the resulting model. <br />