#TITLE_ALTERNATIVE#

As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by...

Full description

Saved in:

Bibliographic Details
Main Author:	SYAMSU (NIM 23205037), IQBAL
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/10683
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by building a parallel distributed crawling system, which allows a large amount of web pages to be handled in a shorter period of time.<p> <br /> <br /> <br /> <br /> <br /> This paper describes the design of distributed crawler for a web search engine. The main focus of the design is on the issues such as overlap, communication overhead and how to minimize their effects using a coordinated system. The design consists of four crawler processes that have been tested in a parallel crawler intra-site network, using breadth-first strategy with exchange mode. Analysis has been done on a 1.2 GB data sample which is resulted from a crawler using query on a database coordinator.<p> <br /> <br /> <br /> <br /> <br /> By using a distributed parallel processing, the performance of a crawler increases. However, the addition of process number is not always directly proportional with performance. In addition, the modeling using exchange mode has an overlap value (N-I)/I which is smaller but increases scope value.

#TITLE_ALTERNATIVE#

Similar Items