Efficient singapore room rental search with data mining

The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Cluste...

Full description

Saved in:
Bibliographic Details
Main Author: Koh, Fabian
Other Authors: Wee Kim Wee School of Communication and Information
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61615
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-61615
record_format dspace
spelling sg-ntu-dr.10356-616152019-12-10T13:57:50Z Efficient singapore room rental search with data mining Koh, Fabian Wee Kim Wee School of Communication and Information Cong Gao DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Clustering. The proposed clustering method adds geographical relationship among the textual relevance search results. The second objective is to develop a Rental Property Search Engine to demonstrate the result of applying Relevance-based Clustering to achieve efficient room rental search in Singapore. The essential part of this process is the ability to extract geographical information from webpages. The author narrows the scope of the study down to Singapore property websites, whereby the geographical information can be easily extracted from the map latitude and longitude information available in all of the major property websites in Singapore. The rental property search engine is custom-coded by the author using Python 2.7 programming language and is being deployed on Google App Engine (GAE) cloud hosting platform. The search engine consists of a property content web crawler that crawls rental section of Singapore property websites, and downloads content from each URL into the Listing table. Next, Data Pre-processing process is used to cleanse and tokenize the downloaded content to create and update into Inverted Index. Processed URLs are recorded into the Done-Process table to prevent duplicate effort. Upon receiving user query input, the query text will be cleansed and tokenized by Query Parsing process before passing over to Scoring and Ranking process to convert into vector form for Cosine Similarity score computation. The scoring will be ranked and the top K number of listings will form the Top K List. The Top K List is used to compute the URL Spherical Distance Matrix and clustering is performed on the URL Spherical Distance Matrix to discover geographical relationship among the top K textual relevance listings. The clustered result is converted into HTML format and returned to the user. The Information Retrieval (IR) effectiveness of the search engine based on K value = 100 has a low average F-Measure of 26%. Whereas, IR effectiveness based on K value = 20 has a better average F-Measure of 78%. Master of Science (Information Studies) 2014-06-17T03:27:10Z 2014-06-17T03:27:10Z 2014 2014 Thesis http://hdl.handle.net/10356/61615 en Nanyang Technological University 60 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Koh, Fabian
Efficient singapore room rental search with data mining
description The author wants to answer the question: how Data Mining techniques can be utilised to improve the efficiency of room rental search? With this, the first objective of this study is to develop a clustering method in the context of Singapore Room Rental listing retrieval, called Relevance-based Clustering. The proposed clustering method adds geographical relationship among the textual relevance search results. The second objective is to develop a Rental Property Search Engine to demonstrate the result of applying Relevance-based Clustering to achieve efficient room rental search in Singapore. The essential part of this process is the ability to extract geographical information from webpages. The author narrows the scope of the study down to Singapore property websites, whereby the geographical information can be easily extracted from the map latitude and longitude information available in all of the major property websites in Singapore. The rental property search engine is custom-coded by the author using Python 2.7 programming language and is being deployed on Google App Engine (GAE) cloud hosting platform. The search engine consists of a property content web crawler that crawls rental section of Singapore property websites, and downloads content from each URL into the Listing table. Next, Data Pre-processing process is used to cleanse and tokenize the downloaded content to create and update into Inverted Index. Processed URLs are recorded into the Done-Process table to prevent duplicate effort. Upon receiving user query input, the query text will be cleansed and tokenized by Query Parsing process before passing over to Scoring and Ranking process to convert into vector form for Cosine Similarity score computation. The scoring will be ranked and the top K number of listings will form the Top K List. The Top K List is used to compute the URL Spherical Distance Matrix and clustering is performed on the URL Spherical Distance Matrix to discover geographical relationship among the top K textual relevance listings. The clustered result is converted into HTML format and returned to the user. The Information Retrieval (IR) effectiveness of the search engine based on K value = 100 has a low average F-Measure of 26%. Whereas, IR effectiveness based on K value = 20 has a better average F-Measure of 78%.
author2 Wee Kim Wee School of Communication and Information
author_facet Wee Kim Wee School of Communication and Information
Koh, Fabian
format Theses and Dissertations
author Koh, Fabian
author_sort Koh, Fabian
title Efficient singapore room rental search with data mining
title_short Efficient singapore room rental search with data mining
title_full Efficient singapore room rental search with data mining
title_fullStr Efficient singapore room rental search with data mining
title_full_unstemmed Efficient singapore room rental search with data mining
title_sort efficient singapore room rental search with data mining
publishDate 2014
url http://hdl.handle.net/10356/61615
_version_ 1681037483314249728