#TITLE_ALTERNATIVE#
As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/10683 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:10683 |
---|---|
spelling |
id-itb.:106832017-09-27T15:37:37Z#TITLE_ALTERNATIVE# SYAMSU (NIM 23205037), IQBAL Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/10683 As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by building a parallel distributed crawling system, which allows a large amount of web pages to be handled in a shorter period of time.<p> <br /> <br /> <br /> <br /> <br /> This paper describes the design of distributed crawler for a web search engine. The main focus of the design is on the issues such as overlap, communication overhead and how to minimize their effects using a coordinated system. The design consists of four crawler processes that have been tested in a parallel crawler intra-site network, using breadth-first strategy with exchange mode. Analysis has been done on a 1.2 GB data sample which is resulted from a crawler using query on a database coordinator.<p> <br /> <br /> <br /> <br /> <br /> By using a distributed parallel processing, the performance of a crawler increases. However, the addition of process number is not always directly proportional with performance. In addition, the modeling using exchange mode has an overlap value (N-I)/I which is smaller but increases scope value. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
As the web grows and the need for larger web document is increasing, a high performance web crawler is required. A single web crawler is practically not capable of handling such need. With a high performance web crawler, it makes it possible to do parallel processing. Thus, the approach taken is by building a parallel distributed crawling system, which allows a large amount of web pages to be handled in a shorter period of time.<p> <br />
<br />
<br />
<br />
<br />
This paper describes the design of distributed crawler for a web search engine. The main focus of the design is on the issues such as overlap, communication overhead and how to minimize their effects using a coordinated system. The design consists of four crawler processes that have been tested in a parallel crawler intra-site network, using breadth-first strategy with exchange mode. Analysis has been done on a 1.2 GB data sample which is resulted from a crawler using query on a database coordinator.<p> <br />
<br />
<br />
<br />
<br />
By using a distributed parallel processing, the performance of a crawler increases. However, the addition of process number is not always directly proportional with performance. In addition, the modeling using exchange mode has an overlap value (N-I)/I which is smaller but increases scope value. |
format |
Theses |
author |
SYAMSU (NIM 23205037), IQBAL |
spellingShingle |
SYAMSU (NIM 23205037), IQBAL #TITLE_ALTERNATIVE# |
author_facet |
SYAMSU (NIM 23205037), IQBAL |
author_sort |
SYAMSU (NIM 23205037), IQBAL |
title |
#TITLE_ALTERNATIVE# |
title_short |
#TITLE_ALTERNATIVE# |
title_full |
#TITLE_ALTERNATIVE# |
title_fullStr |
#TITLE_ALTERNATIVE# |
title_full_unstemmed |
#TITLE_ALTERNATIVE# |
title_sort |
#title_alternative# |
url |
https://digilib.itb.ac.id/gdl/view/10683 |
_version_ |
1820665931073847296 |