#TITLE_ALTERNATIVE#

As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by...

Full description

Saved in:
Bibliographic Details
Main Author: SYAMSU (NIM 23205037), IQBAL
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/10683
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:10683
spelling id-itb.:106832017-09-27T15:37:37Z#TITLE_ALTERNATIVE# SYAMSU (NIM 23205037), IQBAL Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/10683 As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by building a parallel distributed crawling system, which allows a large amount of web pages to be handled in a shorter period of time.<p> <br /> <br /> <br /> <br /> <br /> This paper describes the design of distributed crawler for a web search engine. The main focus of the design is on the issues such as overlap, communication overhead and how to minimize their effects using a coordinated system. The design consists of four crawler processes that have been tested in a parallel crawler intra-site network, using breadth-first strategy with exchange mode. Analysis has been done on a 1.2 GB data sample which is resulted from a crawler using query on a database coordinator.<p> <br /> <br /> <br /> <br /> <br /> By using a distributed parallel processing, the performance of a crawler increases. However, the addition of process number is not always directly proportional with performance. In addition, the modeling using exchange mode has an overlap value (N-I)/I which is smaller but increases scope value. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by building a parallel distributed crawling system, which allows a large amount of web pages to be handled in a shorter period of time.<p> <br /> <br /> <br /> <br /> <br /> This paper describes the design of distributed crawler for a web search engine. The main focus of the design is on the issues such as overlap, communication overhead and how to minimize their effects using a coordinated system. The design consists of four crawler processes that have been tested in a parallel crawler intra-site network, using breadth-first strategy with exchange mode. Analysis has been done on a 1.2 GB data sample which is resulted from a crawler using query on a database coordinator.<p> <br /> <br /> <br /> <br /> <br /> By using a distributed parallel processing, the performance of a crawler increases. However, the addition of process number is not always directly proportional with performance. In addition, the modeling using exchange mode has an overlap value (N-I)/I which is smaller but increases scope value.
format Theses
author SYAMSU (NIM 23205037), IQBAL
spellingShingle SYAMSU (NIM 23205037), IQBAL
#TITLE_ALTERNATIVE#
author_facet SYAMSU (NIM 23205037), IQBAL
author_sort SYAMSU (NIM 23205037), IQBAL
title #TITLE_ALTERNATIVE#
title_short #TITLE_ALTERNATIVE#
title_full #TITLE_ALTERNATIVE#
title_fullStr #TITLE_ALTERNATIVE#
title_full_unstemmed #TITLE_ALTERNATIVE#
title_sort #title_alternative#
url https://digilib.itb.ac.id/gdl/view/10683
_version_ 1820665931073847296