Textual knowledge integration for financial asset management

The scenario when investors need to manage a large number of financial assets has an essential difference from what most of the people do for stock movement prediction today. In traditional asset allocation models, expected returns and correlations of financial assets are difficult to estimate from...

Full description

Saved in:
Bibliographic Details
Main Author: Xing, Frank Zhutian
Other Authors: Erik Cambria
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2018
Subjects:
Online Access:https://hdl.handle.net/10356/87459
http://hdl.handle.net/10220/46751
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The scenario when investors need to manage a large number of financial assets has an essential difference from what most of the people do for stock movement prediction today. In traditional asset allocation models, expected returns and correlations of financial assets are difficult to estimate from historical price series, which are non-stationary and volatile. Therefore, I resort to textual knowledge hidden behind the huge amount of unstructured information produced by human beings. The research goals of this thesis include incorporating natural language process- ing techniques into several asset allocation models and finding the proper variables in financial models that naturally link to the contents of financial reports and the market sentiment. New perspectives investigated in the thesis extend the current framework of the Markowitz model and the Black-Litterman model by re-thinking asset expected returns and asset correlations. I try to inject into these two concepts new connotations. Both sub-symbolic AI and symbolic AI approaches are explored for semantic linkage and market view modeling, which are associated with key variables in asset allocation models. In the introductory chapter, types of financial texts are reviewed. However, most of the existing approaches treat heterogeneous information sources with no difference. I propose to separately consider semantics conveyed in financial texts and the sentiment time series formulated from social media posts. Afterward, re- cent advances in computational semantic representation of words and documents are leveraged to construct a dependence structure of financial assets. This structure (termed vine dependence) is useful in robust estimation of the covariance matrix of asset returns, which is a critical risk indicator of the asset combination held by investors. A vine-growing algorithm is proposed and a large vine structure for main US stocks is constructed. Furthermore, I study adding the market sentiment to the posterior inference of asset expected returns. Specially, sentic computing, a concept-level sentiment analysis method that takes advantages of syntactic features, is used in processing mass opinion streams. A novel recurrent neural network design termed ECM-LSTM is used in forming subjective investor views and benchmarked with popular neural network architectures such as DENFIS and LSTM, and forecasting models such as ARIMA and the Holt-Winters methods. The sentiment views enable explaining asset re-allocation decisions in a storytelling manner. In the end, like in many ambitious AI projects, the system needs maintenance to keep pace with demands and accumulation of commonsense knowledge to prevent having to start all over again. I discuss a method for continuously optimizing the polarity scores in a sentiment knowledge base by new-coming information. A series of experiments were conducted to test the portfolio performances, the validity of sentiment time series, and model scalability. I find the robust estimation of asset correlations by semantic linkages to be superior to estimation using historical price data in a sense that with the help of a proper semantic vine, the portfolio outperformed 80% to 90% of its peers in terms of annualized return. The improvement in annualized return is circa 2% for incorporating sentiment, and more than 10% for employing ECM-LSTM. This thesis increases our understanding of how to systematically integrate textual knowledge for financial asset management.