An enhanced feature representation based on linear regression model for stock market prediction
Stock price prediction has been an attractive research domain for both investors and computer scientists for more than a decade. Reaction prediction to the stock market, especially based on released financial news articles and published stock prices, still poses a great challenge to researchers beca...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOS Press
2018
|
Online Access: | http://psasir.upm.edu.my/id/eprint/73103/1/STOCK.pdf http://psasir.upm.edu.my/id/eprint/73103/ https://content.iospress.com/articles/intelligent-data-analysis/ida163316 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
id |
my.upm.eprints.73103 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.731032021-02-28T17:44:54Z http://psasir.upm.edu.my/id/eprint/73103/ An enhanced feature representation based on linear regression model for stock market prediction Ihlayyel, Hani Sharef, Nurfadhlina Mohd Ahmed Nazri, Mohd Zakree Abu Bakar, Azuraliza Stock price prediction has been an attractive research domain for both investors and computer scientists for more than a decade. Reaction prediction to the stock market, especially based on released financial news articles and published stock prices, still poses a great challenge to researchers because the prediction accuracy is relatively low. For prediction purposes, linear regression is a popular method. Statistical metrics, such as the Document Frequency (DF), term frequency-invert document frequency (TF-IDF) and information gain (IG), are used for feature selection to extract the most expressive features to reduce the high dimensionality of the data. However, the effectivenesses of the available metrics have not been explored in identifying important financial feature representations that have dependable and strong relations with the stock price. The objective of this study are (i) to investigate the performance of five statistical metrics, namely, DF, TF-IDF, IG, Chi-square Statistics (Chi-Sqr) and occurrence in identifying important features that can represent the news and have a strong relationship with the stock price; (ii) to introduce feedback variables, namely, the prediction accuracy (PA), directional accuracy (DA) and closeness accuracy (CA), to capture the interaction between the released news and the published stock prices; and (iii) to introduce a prediction model that integrates features from financial news and a stock price value series based on a 20-minute time lag using linear regression. The experiment used the ELR-BoW method to build a number of 330 datasets with five statistical metrics to select different feature sizes of 50, 100, 150, 200, 250, 300, 400, 500, 600, 700 and 800. The performance of ELR-BoW is observed based on three parameters, namely, PA, DA and CA, and is compared against Naïve Bayes (NB) as the benchmark approach and the Support Vector Machine (SVM). The proposed ELR-BoW-SVM obtained a higher accuracy compared to ELR-BoW-NB, where the best feedback measure is PA, which has an F-measure value of 0.842. In addition, the best number of features is 300 features and using document frequency DF statistical metric. The identification of the top feature representations for financial news is highly promising for automatic news processing for stock prediction. This study demonstrates that the identification of the top feature representations for financial news is highly promising for news article processing in stock prediction. IOS Press 2018 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/73103/1/STOCK.pdf Ihlayyel, Hani and Sharef, Nurfadhlina Mohd and Ahmed Nazri, Mohd Zakree and Abu Bakar, Azuraliza (2018) An enhanced feature representation based on linear regression model for stock market prediction. Intelligent Data Analysis, 22 (1). 45 - 76. ISSN 1088-467X; ESSN: 1571-4128 https://content.iospress.com/articles/intelligent-data-analysis/ida163316 10.3233/IDA-163316 |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
Stock price prediction has been an attractive research domain for both investors and computer scientists for more than a decade. Reaction prediction to the stock market, especially based on released financial news articles and published stock prices, still poses a great challenge to researchers because the prediction accuracy is relatively low. For prediction purposes, linear regression is a popular method. Statistical metrics, such as the Document Frequency (DF), term frequency-invert document frequency (TF-IDF) and information gain (IG), are used for feature selection to extract the most expressive features to reduce the high dimensionality of the data. However, the effectivenesses of the available metrics have not been explored in identifying important financial feature representations that have dependable and strong relations with the stock price. The objective of this study are (i) to investigate the performance of five statistical metrics, namely, DF, TF-IDF, IG, Chi-square Statistics (Chi-Sqr) and occurrence in identifying important features that can represent the news and have a strong relationship with the stock price; (ii) to introduce feedback variables, namely, the prediction accuracy (PA), directional accuracy (DA) and closeness accuracy (CA), to capture the interaction between the released news and the published stock prices; and (iii) to introduce a prediction model that integrates features from financial news and a stock price value series based on a 20-minute time lag using linear regression. The experiment used the ELR-BoW method to build a number of 330 datasets with five statistical metrics to select different feature sizes of 50, 100, 150, 200, 250, 300, 400, 500, 600, 700 and 800. The performance of ELR-BoW is observed based on three parameters, namely, PA, DA and CA, and is compared against Naïve Bayes (NB) as the benchmark approach and the Support Vector Machine (SVM). The proposed ELR-BoW-SVM obtained a higher accuracy compared to ELR-BoW-NB, where the best feedback measure is PA, which has an F-measure value of 0.842. In addition, the best number of features is 300 features and using document frequency DF statistical metric. The identification of the top feature representations for financial news is highly promising for automatic news processing for stock prediction. This study demonstrates that the identification of the top feature representations for financial news is highly promising for news article processing in stock prediction. |
format |
Article |
author |
Ihlayyel, Hani Sharef, Nurfadhlina Mohd Ahmed Nazri, Mohd Zakree Abu Bakar, Azuraliza |
spellingShingle |
Ihlayyel, Hani Sharef, Nurfadhlina Mohd Ahmed Nazri, Mohd Zakree Abu Bakar, Azuraliza An enhanced feature representation based on linear regression model for stock market prediction |
author_facet |
Ihlayyel, Hani Sharef, Nurfadhlina Mohd Ahmed Nazri, Mohd Zakree Abu Bakar, Azuraliza |
author_sort |
Ihlayyel, Hani |
title |
An enhanced feature representation based on linear regression model for stock market prediction |
title_short |
An enhanced feature representation based on linear regression model for stock market prediction |
title_full |
An enhanced feature representation based on linear regression model for stock market prediction |
title_fullStr |
An enhanced feature representation based on linear regression model for stock market prediction |
title_full_unstemmed |
An enhanced feature representation based on linear regression model for stock market prediction |
title_sort |
enhanced feature representation based on linear regression model for stock market prediction |
publisher |
IOS Press |
publishDate |
2018 |
url |
http://psasir.upm.edu.my/id/eprint/73103/1/STOCK.pdf http://psasir.upm.edu.my/id/eprint/73103/ https://content.iospress.com/articles/intelligent-data-analysis/ida163316 |
_version_ |
1693727620718395392 |