Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality

While tuple extraction for a given relation has been an active research area, its dual problem of pattern search- to find and rank patterns in a principled way- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As o...

Full description

Saved in:
Bibliographic Details
Main Authors: FANG, Yuan, CHANG, Kevin Chen-Chuan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4063
https://ink.library.smu.edu.sg/context/sis_research/article/5066/viewcontent/p825_fang.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5066
record_format dspace
spelling sg-smu-ink.sis_research-50662018-07-20T05:00:13Z Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality FANG, Yuan CHANG, Kevin Chen-Chuan While tuple extraction for a given relation has been an active research area, its dual problem of pattern search- to find and rank patterns in a principled way- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As our objectives, we stress reusability for pattern search and scalability of tuple extraction, such that our approach can be applied to very large corpora like the Web. As the key foundation, we propose a conceptual model PRDualRank to capture the notion of precision and recall for both tuples and patterns in a principled way, leading to the "rediscovery" of the Pattern-Relation Duality- the formal quantification of the reinforcement between patterns and tuples with the metrics of precision and recall. We also develop a concrete framework for PRDualRank, guided by the principles of a perfect sampling process over a complete corpus. Finally, we evaluated our framework over the real Web. Experiments show that on all three target relations our principled approach greatly outperforms the previous state-of-the-art system in both effectiveness and efficiency. In particular, we improved optimal F-score by up to 64%. 2011-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4063 info:doi/10.1145/1935826.1935933 https://ink.library.smu.edu.sg/context/sis_research/article/5066/viewcontent/p825_fang.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Algorithms Experimentation Design Performance Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Algorithms
Experimentation
Design
Performance
Databases and Information Systems
spellingShingle Algorithms
Experimentation
Design
Performance
Databases and Information Systems
FANG, Yuan
CHANG, Kevin Chen-Chuan
Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
description While tuple extraction for a given relation has been an active research area, its dual problem of pattern search- to find and rank patterns in a principled way- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As our objectives, we stress reusability for pattern search and scalability of tuple extraction, such that our approach can be applied to very large corpora like the Web. As the key foundation, we propose a conceptual model PRDualRank to capture the notion of precision and recall for both tuples and patterns in a principled way, leading to the "rediscovery" of the Pattern-Relation Duality- the formal quantification of the reinforcement between patterns and tuples with the metrics of precision and recall. We also develop a concrete framework for PRDualRank, guided by the principles of a perfect sampling process over a complete corpus. Finally, we evaluated our framework over the real Web. Experiments show that on all three target relations our principled approach greatly outperforms the previous state-of-the-art system in both effectiveness and efficiency. In particular, we improved optimal F-score by up to 64%.
format text
author FANG, Yuan
CHANG, Kevin Chen-Chuan
author_facet FANG, Yuan
CHANG, Kevin Chen-Chuan
author_sort FANG, Yuan
title Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
title_short Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
title_full Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
title_fullStr Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
title_full_unstemmed Searching patterns for relation extraction over the Web: Rediscovering the pattern-relation duality
title_sort searching patterns for relation extraction over the web: rediscovering the pattern-relation duality
publisher Institutional Knowledge at Singapore Management University
publishDate 2011
url https://ink.library.smu.edu.sg/sis_research/4063
https://ink.library.smu.edu.sg/context/sis_research/article/5066/viewcontent/p825_fang.pdf
_version_ 1770574207091998720