THE STRATEGIES OF FP-GROWTH ALGORITHM IMPLEMENTATION TO SUPPORT ASSOCIATION RULE MINING IN SPATIO-TEMPORAL DATA

The research on data mining in the domain of geo-spatio-temporal is a relatively new field of study. Some data mining techniques have been developed, including the spatio-temporal association rule mining. As one of the techniques in spatio-temporal data mining, spatio-temporal association rule is an...

Full description

Saved in:
Bibliographic Details
Main Author: MUKHLASH (NIM : 335 05 001); Tim Pembimbing : Prof. Dr. Ir. Benhard Sitohang; Dr. Ir. D. Muhall, IMAM
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/16022
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The research on data mining in the domain of geo-spatio-temporal is a relatively new field of study. Some data mining techniques have been developed, including the spatio-temporal association rule mining. As one of the techniques in spatio-temporal data mining, spatio-temporal association rule is an extension of spatial association rule with time constraints. Spatio-temporal data stores the spatial objects and their changes over time. Spatio-temporal association rule occurs if there are spatio-temporal relations in the antecedent or in the consequent of the rule. The main focus of this research is the development of association rules algorithm in spatial data by adding a time constraints. Two important aspects in the searching for spatio-temporal association rule is data preprocessing method and generating frequent predicates algorithms. These aspects are both major subject in this research. The data preprocessing method serves to process the raw data in the form of spatial and non-spatial data and generate data that ready for mining. Generating frequent predicates performed by development of a compact data structure and algorithm to efficiently access the data therein. Data structure was developed based on FP-Tree for spatio-temporal data (namely FP-Tree*). This structure was chosen because the tree structure has a higher memory efficiency than other forms because of the data size reduction process significantly. Once the FP-Tree has been built, the algorithm was developed to obtain spatio-temporal frequent predicates in the tree. The algorithm used is based on the development of FP-Growth algorithm (namely FP-Growth*). This algorithm was chosen because of the avoidance of the process of generating the frequent itemsets candidates and the use of divide and conquer method to decompose the mining process that would reduce the state space dramatically. <br /> <br /> The developed algorithms have been used to support decision-making process by integrating them into a GIS software. This system is able to analyze spatio-temporal data of health and demographic and produce knowledge in the form of spatio-temporal association rules. To test the algorithms and data structures that have been established, the computational time and memory requirements are empirically measured. The generated association rules were t <br /> <br /> ested by using the objective interesting measure and subjective interesting measure. <br /> <br /> Based on the overall results of the study, it was concluded that the searching algorithm of the FP-Growth* based spatio-temporal association rule can find association rules involving spatial and temporal data. From the test cases used, the algorithm is able to find the associations between demographic data and health data both of spatial and temporal based. The associations generated from this process are further expected to support the decision-making process. In addition, the computational time is influenced by the size of the threshold support value, the confidence, and the length of the pattern. If the support is higher then the execution time of algorithm is faster. It is likewise for the changes in confidence and the length of the pattern. Additional execution time caused by the addition of this pattern is linear function so that it can be concluded that the algorithm based on FP-Growth* is scalable. Compared with Apriori-based algorithms (namely Apriori*), FP-Growth* algorithm is much faster than the Apriori* algorithm for a long (more than six) pattern (the number of fields). For a pattern of less than 6 long, the execution time of Apriori* algorithm is faster than FP-Growth* algorithm. <br /> <br /> To enhance the benefits of the application, the frequent predicates mining algorithms must be developed to support other data mining tasks. For example is spatio-temporal co-location pattern mining. This task process contains finding the object properties similarity at the adjacent location and time (spatial neighborhood and temporal neighborhood).