Online shopping sites crawler
The advancement of technology brought about a wide range of benefits for society but also inevitably contributed to fast-paced lifestyles. Increasingly, people now prefer to carry out their shopping activities online and at the same time, look for innovative new ways to obtain the best bargain. Ther...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/69155 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-69155 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-691552023-03-03T20:56:59Z Online shopping sites crawler Leong, Letitia Justina Si En Liang Qian Hui School of Computer Engineering DRNTU::Engineering The advancement of technology brought about a wide range of benefits for society but also inevitably contributed to fast-paced lifestyles. Increasingly, people now prefer to carry out their shopping activities online and at the same time, look for innovative new ways to obtain the best bargain. Therefore, the aim of this project is to design a way to collect merchants’ data from multiple shopping sites and display them into a platform that enables shoppers to perform product comparison. Firstly, a shopping site crawler was developed using Scrapy framework to initiate crawling and scraping from different shopping sites. As every website is structured differently, the scraping process gets a little more complicated. In order for the web crawler to extract specific data from a website, it requires their XPaths to be specified. That is why, a Tkinter program was created to alleviate this problem of code rework while providing convenience in configuring new and existing web spiders. Secondly, collected merchants’ data will undergo the process of text mining whereby preprocessing, clustering and topic modelling take place. Clustering and topic modelling were used to detect interesting patterns for grouping similar products together and to discover attractive topics. These results will be presented to the shoppers in a way that allow them to search for their desired products easily and efficiently. Thirdly, a frontend web application was established to display recommended products, appealing product themes as well as all merchants that provides the same or one kind of products. In addition, filters were also implemented to facilitate users’ preferences search. Lastly, a backend web application was also set up to manage any product related data within the database. By the end of the project, all objectives were successfully accomplished. There were some unresolved limitations found within the developed system due to time constraint and limited manpower. However, these limitations along with the suggestions for further enhancement can be looked into and brushed up in the future. Bachelor of Engineering (Computer Science) 2016-11-11T08:48:40Z 2016-11-11T08:48:40Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/69155 en Nanyang Technological University 93 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering |
spellingShingle |
DRNTU::Engineering Leong, Letitia Justina Si En Online shopping sites crawler |
description |
The advancement of technology brought about a wide range of benefits for society but also inevitably contributed to fast-paced lifestyles. Increasingly, people now prefer to carry out their shopping activities online and at the same time, look for innovative new ways to obtain the best bargain. Therefore, the aim of this project is to design a way to collect merchants’ data from multiple shopping sites and display them into a platform that enables shoppers to perform product comparison.
Firstly, a shopping site crawler was developed using Scrapy framework to initiate crawling and scraping from different shopping sites. As every website is structured differently, the scraping process gets a little more complicated. In order for the web crawler to extract specific data from a website, it requires their XPaths to be specified. That is why, a Tkinter program was created to alleviate this problem of code rework while providing convenience in configuring new and existing web spiders. Secondly, collected merchants’ data will undergo the process of text mining whereby preprocessing, clustering and topic modelling take place. Clustering and topic modelling were used to detect interesting patterns for grouping similar products together and to discover attractive topics. These results will be presented to the shoppers in a way that allow them to search for their desired products easily and efficiently.
Thirdly, a frontend web application was established to display recommended products, appealing product themes as well as all merchants that provides the same or one kind of products. In addition, filters were also implemented to facilitate users’ preferences search. Lastly, a backend web application was also set up to manage any product related data within the database.
By the end of the project, all objectives were successfully accomplished. There were some unresolved limitations found within the developed system due to time constraint and limited manpower. However, these limitations along with the suggestions for further enhancement can be looked into and brushed up in the future. |
author2 |
Liang Qian Hui |
author_facet |
Liang Qian Hui Leong, Letitia Justina Si En |
format |
Final Year Project |
author |
Leong, Letitia Justina Si En |
author_sort |
Leong, Letitia Justina Si En |
title |
Online shopping sites crawler |
title_short |
Online shopping sites crawler |
title_full |
Online shopping sites crawler |
title_fullStr |
Online shopping sites crawler |
title_full_unstemmed |
Online shopping sites crawler |
title_sort |
online shopping sites crawler |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/69155 |
_version_ |
1759858235912749056 |