Web crawler and NLP enabled data mining : a statistical study on the formation of hotel ratings

Web crawler has been regarded as one of the most effective ways in extracting large amount of data from websites. With information technology, human languages can be understood by natural language processing (NLP) programs to some extent. In this report, web crawling and natural language processi...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu, Yingchun, Yang, Guang, Zou, Peijun
Other Authors: Goh Kim Huat
Format: Final Year Project
Language:Chinese
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/55818
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: Chinese
Description
Summary:Web crawler has been regarded as one of the most effective ways in extracting large amount of data from websites. With information technology, human languages can be understood by natural language processing (NLP) programs to some extent. In this report, web crawling and natural language processing technology were used to extract reviewer opinions from Tripadvisor webpages. We studied opinions towards 50 hotels located in Las Vegas, Untied States of America, and constructed a model to predict customer ratings in relation to their opinions, experience and hotel ranking. It has been found that reviewer ratings towards a certain hotel has a positive correlation with both reviewer opinions and reviewer experience, and has a negative correlation with hotel ranking. Future research directions include improvement on NLP’s accuracy and applications on other industries such as entertainment, consumer goods, etc.