INTERESTINGNESS MEASURE AND OPTIMIZATION OF GRAPH-PATTERN ASSOCIATION RULE
The continuous progress of information extraction (IE) techniques has led to the construction of large encyclopedic knowledge bases (KB). These KB contain millions of facts about real-world entities such as people, organizations and places. KB are important nowadays because they allow computers t...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/47832 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The continuous progress of information extraction (IE) techniques has led to
the construction of large encyclopedic knowledge bases (KB). These KB contain
millions of facts about real-world entities such as people, organizations and places.
KB are important nowadays because they allow computers to understand the real
world. They are used in multiple applications in information retrieval, query
answering and automatic reasoning, among other fields. Furthermore, the plethora
of information available in today’s KB allows for the discovery of frequent patterns
in the data, one of them is asscociation rules mining. Association rules could used
for predicting true facts, identifying potential errors, and understanding data better.
Association rule could be mining using graph representation. However, the association rules mining has some problems. Research problem of graph-pattern association rule (GPAR) included mining association rules, interestingness measures and
others.
In this dissertation, problems of association rules that discussed are mining hornclosed rules using graph pattern under OWA, misleading association rules, and
generates an excessive number of rules which often pertains to the same or similar
item. In order to solve this dissertation problem, the stages of research undertaken
are first, for mining horn-closed rules using graph pattern, we propose an algorithm
RGGP (Rule Generated Graph Pattern ). Second, for misleading association rules,
we proposed an interestingness measure GPAR on KB using lift PCA confidence. At
this stage, we also propose an algorithm RGKB (Rule Generated Knowledge Bases).
Finally, excessive number of rules which often pertains to the same or similar item
considered using a bijective function max-sum diversification to get the optimum
value of association rules based on interesting measures and diversity.
Lift PCA confidence is an improvement from lift confidence and is proposed as
an interestingness measure for GPAR with probabilistic correlation. After mining
horn-closed rules using graph pattern, confidence measures GPAR are performed.
The confidence measure used two strengthness measures are standard confidence
and PCA confidence, and two interestingness measures are lift confidence and
lift PCA confidence. Algorithm Rule Generated Knowledge Bases (RGKB) use to
generated association rules with coverage coefficient (CC) greater than threshold.
Interestingness measures is carried out confidence measure (CM) algorithm. We
used Yago2 KB core, Yago2s KB, DBPedia 3.8 KB, and Wikidata KB as datasets.
Measurement of the diversity of graph pattern association rules using max-sum
diversification for GPAR that have same consequent.
|
---|