Efficient singapore room rental search with data mining
The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Cluste...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/61615 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-61615 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-616152019-12-10T13:57:50Z Efficient singapore room rental search with data mining Koh, Fabian Wee Kim Wee School of Communication and Information Cong Gao DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Clustering. The proposed clustering method adds geographical relationship among the textual relevance search results. The second objective is to develop a Rental Property Search Engine to demonstrate the result of applying Relevance-based Clustering to achieve efficient room rental search in Singapore. The essential part of this process is the ability to extract geographical information from webpages. The author narrows the scope of the study down to Singapore property websites, whereby the geographical information can be easily extracted from the map latitude and longitude information available in all of the major property websites in Singapore. The rental property search engine is custom-coded by the author using Python 2.7 programming language and is being deployed on Google App Engine (GAE) cloud hosting platform. The search engine consists of a property content web crawler that crawls rental section of Singapore property websites, and downloads content from each URL into the Listing table. Next, Data Pre-processing process is used to cleanse and tokenize the downloaded content to create and update into Inverted Index. Processed URLs are recorded into the Done-Process table to prevent duplicate effort. Upon receiving user query input, the query text will be cleansed and tokenized by Query Parsing process before passing over to Scoring and Ranking process to convert into vector form for Cosine Similarity score computation. The scoring will be ranked and the top K number of listings will form the Top K List. The Top K List is used to compute the URL Spherical Distance Matrix and clustering is performed on the URL Spherical Distance Matrix to discover geographical relationship among the top K textual relevance listings. The clustered result is converted into HTML format and returned to the user. The Information Retrieval (IR) effectiveness of the search engine based on K value = 100 has a low average F-Measure of 26%. Whereas, IR effectiveness based on K value = 20 has a better average F-Measure of 78%. Master of Science (Information Studies) 2014-06-17T03:27:10Z 2014-06-17T03:27:10Z 2014 2014 Thesis http://hdl.handle.net/10356/61615 en Nanyang Technological University 60 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Koh, Fabian Efficient singapore room rental search with data mining |
description |
The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Clustering. The proposed clustering method adds geographical relationship among the textual relevance search results.
The second objective is to develop a Rental Property Search Engine to demonstrate the result of applying Relevance-based Clustering to achieve efficient room rental search in Singapore. The essential part of this process is the ability to extract geographical information from webpages. The author narrows the scope of the study down to Singapore property websites, whereby the geographical information can be easily extracted from the map latitude and longitude information available in all of the major property websites in Singapore.
The rental property search engine is custom-coded by the author using Python 2.7 programming language and is being deployed on Google App Engine (GAE) cloud hosting platform.
The search engine consists of a property content web crawler that crawls rental section of Singapore property websites, and downloads content from each URL into the Listing table. Next, Data Pre-processing process is used to cleanse and tokenize the downloaded content to create and update into Inverted Index. Processed URLs are recorded into the Done-Process table to prevent duplicate effort.
Upon receiving user query input, the query text will be cleansed and tokenized by Query Parsing process before passing over to Scoring and Ranking process to convert
into vector form for Cosine Similarity score computation. The scoring will be ranked and the top K number of listings will form the Top K List.
The Top K List is used to compute the URL Spherical Distance Matrix and clustering is performed on the URL Spherical Distance Matrix to discover geographical relationship among the top K textual relevance listings. The clustered result is converted into HTML format and returned to the user.
The Information Retrieval (IR) effectiveness of the search engine based on K value = 100 has a low average F-Measure of 26%. Whereas, IR effectiveness based on K value = 20 has a better average F-Measure of 78%. |
author2 |
Wee Kim Wee School of Communication and Information |
author_facet |
Wee Kim Wee School of Communication and Information Koh, Fabian |
format |
Theses and Dissertations |
author |
Koh, Fabian |
author_sort |
Koh, Fabian |
title |
Efficient singapore room rental search with data mining |
title_short |
Efficient singapore room rental search with data mining |
title_full |
Efficient singapore room rental search with data mining |
title_fullStr |
Efficient singapore room rental search with data mining |
title_full_unstemmed |
Efficient singapore room rental search with data mining |
title_sort |
efficient singapore room rental search with data mining |
publishDate |
2014 |
url |
http://hdl.handle.net/10356/61615 |
_version_ |
1681037483314249728 |