A study of feature exraction techniques for classifying topics and sentiments from news posts
Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2014
|
Subjects: | |
Online Access: | https://etd.uum.edu.my/5618/1/s814383_01.pdf https://etd.uum.edu.my/5618/2/s814383_02.pdf https://etd.uum.edu.my/5618/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Utara Malaysia |
Language: | English English |
id |
my.uum.etd.5618 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.56182022-04-09T23:28:04Z https://etd.uum.edu.my/5618/ A study of feature exraction techniques for classifying topics and sentiments from news posts Al-Dyani, Wafa Zubair Abdullah T58.5-58.64 Information technology Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time. 2014 Thesis NonPeerReviewed text en https://etd.uum.edu.my/5618/1/s814383_01.pdf text en https://etd.uum.edu.my/5618/2/s814383_02.pdf Al-Dyani, Wafa Zubair Abdullah (2014) A study of feature exraction techniques for classifying topics and sentiments from news posts. Masters thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English |
topic |
T58.5-58.64 Information technology |
spellingShingle |
T58.5-58.64 Information technology Al-Dyani, Wafa Zubair Abdullah A study of feature exraction techniques for classifying topics and sentiments from news posts |
description |
Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract
the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling
technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time. |
format |
Thesis |
author |
Al-Dyani, Wafa Zubair Abdullah |
author_facet |
Al-Dyani, Wafa Zubair Abdullah |
author_sort |
Al-Dyani, Wafa Zubair Abdullah |
title |
A study of feature exraction techniques for classifying topics and sentiments from news posts |
title_short |
A study of feature exraction techniques for classifying topics and sentiments from news posts |
title_full |
A study of feature exraction techniques for classifying topics and sentiments from news posts |
title_fullStr |
A study of feature exraction techniques for classifying topics and sentiments from news posts |
title_full_unstemmed |
A study of feature exraction techniques for classifying topics and sentiments from news posts |
title_sort |
study of feature exraction techniques for classifying topics and sentiments from news posts |
publishDate |
2014 |
url |
https://etd.uum.edu.my/5618/1/s814383_01.pdf https://etd.uum.edu.my/5618/2/s814383_02.pdf https://etd.uum.edu.my/5618/ |
_version_ |
1729706591799738368 |