INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE

The continuous progress of information extraction (IE) techniques has led to the construction of large encyclopedic knowledge bases (KB). These KB contain millions of facts about real-world entities such as people, organizations and places. KB are important nowadays because they allow computers t...

Full description

Saved in:
Bibliographic Details
Main Author: Wahyudi
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/47832
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:47832
spelling id-itb.:478322020-06-22T11:12:46ZINTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE Wahyudi Indonesia Dissertations knowledge bases, association rule mining, graph-pattern association rule, lift PCA confidence, max-sum diversification INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/47832 The continuous progress of information extraction (IE) techniques has led to the construction of large encyclopedic knowledge bases (KB). These KB contain millions of facts about real-world entities such as people, organizations and places. KB are important nowadays because they allow computers to understand the real world. They are used in multiple applications in information retrieval, query answering and automatic reasoning, among other fields. Furthermore, the plethora of information available in today’s KB allows for the discovery of frequent patterns in the data, one of them is asscociation rules mining. Association rules could used for predicting true facts, identifying potential errors, and understanding data better. Association rule could be mining using graph representation. However, the association rules mining has some problems. Research problem of graph-pattern association rule (GPAR) included mining association rules, interestingness measures and others. In this dissertation, problems of association rules that discussed are mining hornclosed rules using graph pattern under OWA, misleading association rules, and generates an excessive number of rules which often pertains to the same or similar item. In order to solve this dissertation problem, the stages of research undertaken are first, for mining horn-closed rules using graph pattern, we propose an algorithm RGGP (Rule Generated Graph Pattern ). Second, for misleading association rules, we proposed an interestingness measure GPAR on KB using lift PCA confidence. At this stage, we also propose an algorithm RGKB (Rule Generated Knowledge Bases). Finally, excessive number of rules which often pertains to the same or similar item considered using a bijective function max-sum diversification to get the optimum value of association rules based on interesting measures and diversity. Lift PCA confidence is an improvement from lift confidence and is proposed as an interestingness measure for GPAR with probabilistic correlation. After mining horn-closed rules using graph pattern, confidence measures GPAR are performed. The confidence measure used two strengthness measures are standard confidence and PCA confidence, and two interestingness measures are lift confidence and lift PCA confidence. Algorithm Rule Generated Knowledge Bases (RGKB) use to generated association rules with coverage coefficient (CC) greater than threshold. Interestingness measures is carried out confidence measure (CM) algorithm. We used Yago2 KB core, Yago2s KB, DBPedia 3.8 KB, and Wikidata KB as datasets. Measurement of the diversity of graph pattern association rules using max-sum diversification for GPAR that have same consequent. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The continuous progress of information extraction (IE) techniques has led to the construction of large encyclopedic knowledge bases (KB). These KB contain millions of facts about real-world entities such as people, organizations and places. KB are important nowadays because they allow computers to understand the real world. They are used in multiple applications in information retrieval, query answering and automatic reasoning, among other fields. Furthermore, the plethora of information available in today’s KB allows for the discovery of frequent patterns in the data, one of them is asscociation rules mining. Association rules could used for predicting true facts, identifying potential errors, and understanding data better. Association rule could be mining using graph representation. However, the association rules mining has some problems. Research problem of graph-pattern association rule (GPAR) included mining association rules, interestingness measures and others. In this dissertation, problems of association rules that discussed are mining hornclosed rules using graph pattern under OWA, misleading association rules, and generates an excessive number of rules which often pertains to the same or similar item. In order to solve this dissertation problem, the stages of research undertaken are first, for mining horn-closed rules using graph pattern, we propose an algorithm RGGP (Rule Generated Graph Pattern ). Second, for misleading association rules, we proposed an interestingness measure GPAR on KB using lift PCA confidence. At this stage, we also propose an algorithm RGKB (Rule Generated Knowledge Bases). Finally, excessive number of rules which often pertains to the same or similar item considered using a bijective function max-sum diversification to get the optimum value of association rules based on interesting measures and diversity. Lift PCA confidence is an improvement from lift confidence and is proposed as an interestingness measure for GPAR with probabilistic correlation. After mining horn-closed rules using graph pattern, confidence measures GPAR are performed. The confidence measure used two strengthness measures are standard confidence and PCA confidence, and two interestingness measures are lift confidence and lift PCA confidence. Algorithm Rule Generated Knowledge Bases (RGKB) use to generated association rules with coverage coefficient (CC) greater than threshold. Interestingness measures is carried out confidence measure (CM) algorithm. We used Yago2 KB core, Yago2s KB, DBPedia 3.8 KB, and Wikidata KB as datasets. Measurement of the diversity of graph pattern association rules using max-sum diversification for GPAR that have same consequent.
format Dissertations
author Wahyudi
spellingShingle Wahyudi
INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
author_facet Wahyudi
author_sort Wahyudi
title INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
title_short INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
title_full INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
title_fullStr INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
title_full_unstemmed INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
title_sort interestingness measure and optimization of graph-pattern association rule
url https://digilib.itb.ac.id/gdl/view/47832
_version_ 1823641364871512064