Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages

Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stand...

Full description

Saved in:
Bibliographic Details
Main Authors: LEE, Ken C. K., BANKAR, Kanchan, ZHENG, Baihua, CHOW, Chi-Yin, WANG, Honggang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1623
https://ink.library.smu.edu.sg/context/sis_research/article/2622/viewcontent/ZhangBHICDCS_2012.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2622
record_format dspace
spelling sg-smu-ink.sis_research-26222018-12-05T06:22:54Z Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages LEE, Ken C. K. BANKAR, Kanchan ZHENG, Baihua CHOW, Chi-Yin WANG, Honggang Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stands for Db-pAge SearcH, to support db-page search. Dash determines db-pages possibly generated by a target web application and its database through exploring the application code and the related database content and supports keyword search on those db-pages. In this paper, we present its system design and focus on the efficiency issue. To minimize costs incurred for collecting, maintaining, indexing and searching a massive number of db-pages that possibly have overlapped contents, Dash derives and indexes db-page fragments in place of db-pages. Each db-page fragment carries a disjointed part of a db-page. To efficiently compute and index db-page fragments from huge datasets, Dash is equipped with MapReduce based algorithms for database crawling and db-page fragment indexing. Besides, Dash has a top-k search algorithm that can efficiently assemble db-page fragments into db-pages relevant to search keywords and returns the k most relevant ones. The performance of Dash is evaluated via extensive experimentation. 2012-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1623 info:doi/10.1109/ICDCS.2012.53 https://ink.library.smu.edu.sg/context/sis_research/article/2622/viewcontent/ZhangBHICDCS_2012.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
LEE, Ken C. K.
BANKAR, Kanchan
ZHENG, Baihua
CHOW, Chi-Yin
WANG, Honggang
Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
description Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stands for Db-pAge SearcH, to support db-page search. Dash determines db-pages possibly generated by a target web application and its database through exploring the application code and the related database content and supports keyword search on those db-pages. In this paper, we present its system design and focus on the efficiency issue. To minimize costs incurred for collecting, maintaining, indexing and searching a massive number of db-pages that possibly have overlapped contents, Dash derives and indexes db-page fragments in place of db-pages. Each db-page fragment carries a disjointed part of a db-page. To efficiently compute and index db-page fragments from huge datasets, Dash is equipped with MapReduce based algorithms for database crawling and db-page fragment indexing. Besides, Dash has a top-k search algorithm that can efficiently assemble db-page fragments into db-pages relevant to search keywords and returns the k most relevant ones. The performance of Dash is evaluated via extensive experimentation.
format text
author LEE, Ken C. K.
BANKAR, Kanchan
ZHENG, Baihua
CHOW, Chi-Yin
WANG, Honggang
author_facet LEE, Ken C. K.
BANKAR, Kanchan
ZHENG, Baihua
CHOW, Chi-Yin
WANG, Honggang
author_sort LEE, Ken C. K.
title Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
title_short Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
title_full Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
title_fullStr Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
title_full_unstemmed Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
title_sort dash: a novel search engine for database-generated dynamic web pages
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/1623
https://ink.library.smu.edu.sg/context/sis_research/article/2622/viewcontent/ZhangBHICDCS_2012.pdf
_version_ 1770571353721667584