DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE

Nowadays, online shops are growing very fast. There are many websites that provide a place for anyone who wants to have an online shop. The increasing number of online shops is currently a problem for Badan Pusat Statistik (Statistics of Indonesia) which is responsible for data collection of all bus...

Full description

Saved in:

Bibliographic Details
Main Author:	Inayati - NIM: 23216038 , NurÃ‚Â’izzah
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/29798
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:29798
spelling	id-itb.:297982018-10-01T10:02:43ZDISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE Inayati - NIM: 23216038 , NurÃ‚Â’izzah Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/29798 Nowadays, online shops are growing very fast. There are many websites that provide a place for anyone who wants to have an online shop. The increasing number of online shops is currently a problem for Badan Pusat Statistik (Statistics of Indonesia) which is responsible for data collection of all business activities in Indonesia because of the difficulty in obtaining information related to online businesses conducted by respondents and household members. Web crawling and web scraping are several ways to extract data from web pages. Because online shop sites use dynamic pages, simple web crawlers cannot retrieve data from that page. <br /> <br /> <br /> <br /> <br /> This research proposes the mechanism of web crawling web pages with dynamic data that is run in a distributed manner. The data extracted is the data of each shop account at two online shop sites. To extract data automatically, automated extraction mechanisms are designed using semantic analysis. To speed up the crawling process,designed a distributed crawling mechanism using Apache Spark. A prototype was built to test the design that was made. Some experiments used the prototype to determine the performance of the proposed distributed crawling. The experimental results show that automated extraction using semantic analysis provides good results with 100 percent precision and 94.94 percent recall. Distributed crawling can speed up the crawling process and simplify scalability settings. To increase the capacity of the extracted data, simply add resources in the form of a node without needing to change the application.. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Nowadays, online shops are growing very fast. There are many websites that provide a place for anyone who wants to have an online shop. The increasing number of online shops is currently a problem for Badan Pusat Statistik (Statistics of Indonesia) which is responsible for data collection of all business activities in Indonesia because of the difficulty in obtaining information related to online businesses conducted by respondents and household members. Web crawling and web scraping are several ways to extract data from web pages. Because online shop sites use dynamic pages, simple web crawlers cannot retrieve data from that page. <br /> <br /> <br /> <br /> <br /> This research proposes the mechanism of web crawling web pages with dynamic data that is run in a distributed manner. The data extracted is the data of each shop account at two online shop sites. To extract data automatically, automated extraction mechanisms are designed using semantic analysis. To speed up the crawling process,designed a distributed crawling mechanism using Apache Spark. A prototype was built to test the design that was made. Some experiments used the prototype to determine the performance of the proposed distributed crawling. The experimental results show that automated extraction using semantic analysis provides good results with 100 percent precision and 94.94 percent recall. Distributed crawling can speed up the crawling process and simplify scalability settings. To increase the capacity of the extracted data, simply add resources in the form of a node without needing to change the application..
format	Theses
author	Inayati - NIM: 23216038 , NurÃ‚Â’izzah
spellingShingle	Inayati - NIM: 23216038 , NurÃ‚Â’izzah DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
author_facet	Inayati - NIM: 23216038 , NurÃ‚Â’izzah
author_sort	Inayati - NIM: 23216038 , NurÃ‚Â’izzah
title	DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
title_short	DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
title_full	DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
title_fullStr	DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
title_full_unstemmed	DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE
title_sort	distributed crawling on online shop website
url	https://digilib.itb.ac.id/gdl/view/29798
_version_	1822923036190310400

DISTRIBUTED CRAWLING ON ONLINE SHOP WEBSITE

Similar Items