ARISE-PIE: A People Information Integration Engine over the Web

Searching for people information on the Web is a common practice in life. However, it is time consuming to search for such information manually. In this paper, we aim to develop an automatic people information search system, named ARISE-PIE. To build such a system, we tackle two major technical chal...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHENG, Vincent W., HOANG, Tao, CHEN, Penghe, FANG, Yuan, YANG, Xiaoyan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4058
https://ink.library.smu.edu.sg/context/sis_research/article/5061/viewcontent/arisepi_ddta_cikm2016.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5061
record_format dspace
spelling sg-smu-ink.sis_research-50612018-07-20T05:03:19Z ARISE-PIE: A People Information Integration Engine over the Web ZHENG, Vincent W. HOANG, Tao CHEN, Penghe FANG, Yuan YANG, Xiaoyan Searching for people information on the Web is a common practice in life. However, it is time consuming to search for such information manually. In this paper, we aim to develop an automatic people information search system, named ARISE-PIE. To build such a system, we tackle two major technical challenges: data harvesting and data integration. For data harvesting, we study how to leverage search engine to help crawl the relevant Web pages for a target entity; then we propose a novel learning to query model that can automatically select a set of "best" queries to maximize collective utility (e.g., precision or recall). For data integration, we study how to leverage flexible forms of constraints as weak supervision to achieve collective information extraction from a target entity’s Web page corpus; then we propose a novel conditional probabilistic formulation to model constraints and an efficient realization to enable the inference with constraints. We evaluate our data harvesting and data integration solutions on the real-world data sets, and show that they both achieve better performance than the state-of-the-art baselines. We also evaluate our system on a benchmark data set and with a user study, in which we both show promising results. 2016-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4058 https://ink.library.smu.edu.sg/context/sis_research/article/5061/viewcontent/arisepi_ddta_cikm2016.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Web crawling Data extraction and integration Data mining Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Web crawling
Data extraction and integration
Data mining
Databases and Information Systems
spellingShingle Web crawling
Data extraction and integration
Data mining
Databases and Information Systems
ZHENG, Vincent W.
HOANG, Tao
CHEN, Penghe
FANG, Yuan
YANG, Xiaoyan
ARISE-PIE: A People Information Integration Engine over the Web
description Searching for people information on the Web is a common practice in life. However, it is time consuming to search for such information manually. In this paper, we aim to develop an automatic people information search system, named ARISE-PIE. To build such a system, we tackle two major technical challenges: data harvesting and data integration. For data harvesting, we study how to leverage search engine to help crawl the relevant Web pages for a target entity; then we propose a novel learning to query model that can automatically select a set of "best" queries to maximize collective utility (e.g., precision or recall). For data integration, we study how to leverage flexible forms of constraints as weak supervision to achieve collective information extraction from a target entity’s Web page corpus; then we propose a novel conditional probabilistic formulation to model constraints and an efficient realization to enable the inference with constraints. We evaluate our data harvesting and data integration solutions on the real-world data sets, and show that they both achieve better performance than the state-of-the-art baselines. We also evaluate our system on a benchmark data set and with a user study, in which we both show promising results.
format text
author ZHENG, Vincent W.
HOANG, Tao
CHEN, Penghe
FANG, Yuan
YANG, Xiaoyan
author_facet ZHENG, Vincent W.
HOANG, Tao
CHEN, Penghe
FANG, Yuan
YANG, Xiaoyan
author_sort ZHENG, Vincent W.
title ARISE-PIE: A People Information Integration Engine over the Web
title_short ARISE-PIE: A People Information Integration Engine over the Web
title_full ARISE-PIE: A People Information Integration Engine over the Web
title_fullStr ARISE-PIE: A People Information Integration Engine over the Web
title_full_unstemmed ARISE-PIE: A People Information Integration Engine over the Web
title_sort arise-pie: a people information integration engine over the web
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/4058
https://ink.library.smu.edu.sg/context/sis_research/article/5061/viewcontent/arisepi_ddta_cikm2016.pdf
_version_ 1770574206332829696