Enhancing downstream ML performance with unconditional diffusion models for return predictions

This study addresses the challenge of enhancing model generalization in financial market return predictions crucial due to the dynamic and unpredictable nature of financial markets. Traditional models often fail to generalize across market conditions, largely due to their inability to capture market...

Full description

Saved in:
Bibliographic Details
Main Author: Agarwala, Pratham
Other Authors: Bo An
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175255
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This study addresses the challenge of enhancing model generalization in financial market return predictions crucial due to the dynamic and unpredictable nature of financial markets. Traditional models often fail to generalize across market conditions, largely due to their inability to capture market dynamics effectively. Previous methods, reliant on simple transformations or generative adversarial networks (GANs) with inherent training instability, fall short in addressing these challenges. To bridge this gap, our research introduces an innovative approach that leverages Unconditional Diffusion Models with self-guidance mechanism for conditioning during inference to perform financial data augmentation. We incorporate technical indicators to help the diffusion model comprehend financial time series, and employ ensemble predictions for each context window to construct a stable output. We use refinement process which leverages the implicit probability density learned by the diffusion model serving as a prior, to iteratively improve the output. A filtration layer, based on EMA, RSI and accompanied by some randomness, is added to provide more control over the fidelity and diversity of the augmented dataset. We evaluate augmented dataset quality through metrics assessing affinity (indistinguishability from real data) and diversity (coverage of real data). These include: (1)Discriminative Score: measures classifier’s ability to differentiate between real and synthetic. (2) Mutual Information Coefficient: measures amount of shared information between the synthetic and real. (3) Variation in Information: Evaluates the diversity within the synthetic data. Findings indicate a direct link between these factors and model performance, with high-diversity and optimal fidelity datasets enhancing forecasting accuracy. Our augmentation method outperformed GANs, mixup-based, and transformation-based techniques in predictive accuracy, diversity, and closely aligned with real market dynamics. Finally, we evaluate a real life application, by employing a simple trading strategy using the downstream predictions and its back test on the test data showcased significantly better Return on Investment and Sharpe Ratio when augmentation is applied, indicating better risk-adjusted returns.