Design and development of web crawler for indexing an intranet.

The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word l...

Full description

Saved in:

Bibliographic Details
Main Author:	Lee, Chee Onn.
Other Authors:	Wee Kim Wee School of Communication and Information
Format:	Theses and Dissertations
Published:	2008
Subjects:	DRNTU::Library and information science::Libraries::Technologies
Online Access:	http://hdl.handle.net/10356/1523
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University

id	sg-ntu-dr.10356-1523
record_format	dspace
spelling	sg-ntu-dr.10356-15232019-12-10T12:02:33Z Design and development of web crawler for indexing an intranet. Lee, Chee Onn. Wee Kim Wee School of Communication and Information DRNTU::Library and information science::Libraries::Technologies The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word location such as line number and the word sequence number of the Web page allows phrases or strings of words to be searched and identified effectively. Better handling of comments, scripts and tags by the Web crawler was found to increase the data quality in the data collection process. It was found that enhancing the tags identification modules to handle the tags with attributes allows the Web crawler to be more effective in the retrieval of the data collection process. Master of Science (Information Studies) 2008-09-10T08:34:02Z 2008-09-10T08:34:02Z 2005 2005 Thesis http://hdl.handle.net/10356/1523 Nanyang Technological University application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
topic	DRNTU::Library and information science::Libraries::Technologies
spellingShingle	DRNTU::Library and information science::Libraries::Technologies Lee, Chee Onn. Design and development of web crawler for indexing an intranet.
description	The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word location such as line number and the word sequence number of the Web page allows phrases or strings of words to be searched and identified effectively. Better handling of comments, scripts and tags by the Web crawler was found to increase the data quality in the data collection process. It was found that enhancing the tags identification modules to handle the tags with attributes allows the Web crawler to be more effective in the retrieval of the data collection process.
author2	Wee Kim Wee School of Communication and Information
author_facet	Wee Kim Wee School of Communication and Information Lee, Chee Onn.
format	Theses and Dissertations
author	Lee, Chee Onn.
author_sort	Lee, Chee Onn.
title	Design and development of web crawler for indexing an intranet.
title_short	Design and development of web crawler for indexing an intranet.
title_full	Design and development of web crawler for indexing an intranet.
title_fullStr	Design and development of web crawler for indexing an intranet.
title_full_unstemmed	Design and development of web crawler for indexing an intranet.
title_sort	design and development of web crawler for indexing an intranet.
publishDate	2008
url	http://hdl.handle.net/10356/1523
_version_	1681036447959744512

Design and development of web crawler for indexing an intranet.

Similar Items