Design and development of web crawler for indexing an intranet.

The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word l...

全面介紹

Saved in:

書目詳細資料
主要作者:	Lee, Chee Onn.
其他作者:	Wee Kim Wee School of Communication and Information
格式:	Theses and Dissertations
出版:	2008
主題:	DRNTU::Library and information science::Libraries::Technologies
在線閱讀:	http://hdl.handle.net/10356/1523
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	The paper discusses database and program design of the Web crawler. The Web crawler performance was found to improve by storing the hyperlinks into three different types namely internal HTML URL addresses, internal non-HTML URL addresses and external URL addresses. In addition, the storing of word location such as line number and the word sequence number of the Web page allows phrases or strings of words to be searched and identified effectively. Better handling of comments, scripts and tags by the Web crawler was found to increase the data quality in the data collection process. It was found that enhancing the tags identification modules to handle the tags with attributes allows the Web crawler to be more effective in the retrieval of the data collection process.

Design and development of web crawler for indexing an intranet.

相似書籍