Automatic sentiment classification of movie reviews.

The increasing number of online reviews of goods and services has lead to the development of many approaches for sentiment classification and analysis. This study presents a framework for sentiment classification for movie reviews. There are several existing approaches for sentiment classificati...

Full description

Saved in:
Bibliographic Details
Main Author: Chan, Kok Hong.
Other Authors: Wee Kim Wee School of Communication and Information
Format: Research Report
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/17252
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-17252
record_format dspace
spelling sg-ntu-dr.10356-172522019-12-10T14:32:10Z Automatic sentiment classification of movie reviews. Chan, Kok Hong. Wee Kim Wee School of Communication and Information DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The increasing number of online reviews of goods and services has lead to the development of many approaches for sentiment classification and analysis. This study presents a framework for sentiment classification for movie reviews. There are several existing approaches for sentiment classification. Sentiment classification using unigrams has being the most successful for most of the previous studies. However, results generated by unigrams could be degraded by negation terms and terms that require users to do inference. To address this problem, there are several studies that indicate that higher order n-grams have good potential of producing better classification. Problems encountered by unigrams such as negation terms could be solved by higher order n-grams such as bigrams because terms like “not good” has being extracted as a single term. In addition, higher order n-grams with feature reduction methods, such as X2 features reduction, are been explored to see if this attempt will produce better results. The movie reviews datasets are selected because they are considered to be one of the most difficult domains to classify. Producing good classification results from the movie review domain will ensure that good results will be achieved when applied on other datasets. The research methods used for this study will consist of three portions. Firstly, the results from the simple unigram approach in this study are compared with the results presented by Pang, Lee & Vaithyanathan (2002). Secondly, the classification results generated by higher n-grams and adjectives are compared to those presented by Pang et al. (2002). Lastly, the classification results after application of feature reduction methods such as X2 features reduction are compared. An application has also been developed for non-technical users so that these users are not subjected to the tedious process of creating training set and using sentiment classification. Additionally, this application has been bundled with additional feature selection options. 2009-06-03T06:49:02Z 2009-06-03T06:49:02Z 2008 2008 Research Report http://hdl.handle.net/10356/17252 en 97 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Chan, Kok Hong.
Automatic sentiment classification of movie reviews.
description The increasing number of online reviews of goods and services has lead to the development of many approaches for sentiment classification and analysis. This study presents a framework for sentiment classification for movie reviews. There are several existing approaches for sentiment classification. Sentiment classification using unigrams has being the most successful for most of the previous studies. However, results generated by unigrams could be degraded by negation terms and terms that require users to do inference. To address this problem, there are several studies that indicate that higher order n-grams have good potential of producing better classification. Problems encountered by unigrams such as negation terms could be solved by higher order n-grams such as bigrams because terms like “not good” has being extracted as a single term. In addition, higher order n-grams with feature reduction methods, such as X2 features reduction, are been explored to see if this attempt will produce better results. The movie reviews datasets are selected because they are considered to be one of the most difficult domains to classify. Producing good classification results from the movie review domain will ensure that good results will be achieved when applied on other datasets. The research methods used for this study will consist of three portions. Firstly, the results from the simple unigram approach in this study are compared with the results presented by Pang, Lee & Vaithyanathan (2002). Secondly, the classification results generated by higher n-grams and adjectives are compared to those presented by Pang et al. (2002). Lastly, the classification results after application of feature reduction methods such as X2 features reduction are compared. An application has also been developed for non-technical users so that these users are not subjected to the tedious process of creating training set and using sentiment classification. Additionally, this application has been bundled with additional feature selection options.
author2 Wee Kim Wee School of Communication and Information
author_facet Wee Kim Wee School of Communication and Information
Chan, Kok Hong.
format Research Report
author Chan, Kok Hong.
author_sort Chan, Kok Hong.
title Automatic sentiment classification of movie reviews.
title_short Automatic sentiment classification of movie reviews.
title_full Automatic sentiment classification of movie reviews.
title_fullStr Automatic sentiment classification of movie reviews.
title_full_unstemmed Automatic sentiment classification of movie reviews.
title_sort automatic sentiment classification of movie reviews.
publishDate 2009
url http://hdl.handle.net/10356/17252
_version_ 1681045436525182976