History Is not enough: adaptive financial data augmentation with a curriculum planner

In quantitative finance, one of the key challenges lies in the discrepancy between training performance and real-world performance, especially due to concept drift. Because of overfitting, models that achieve high accuracy on training data frequently fail to generalise to unseen data, which dimin...

Full description

Saved in:
Bibliographic Details
Main Author: Teng, Yao Long
Other Authors: Bo An
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181011
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In quantitative finance, one of the key challenges lies in the discrepancy between training performance and real-world performance, especially due to concept drift. Because of overfitting, models that achieve high accuracy on training data frequently fail to generalise to unseen data, which diminishes their practical utility in live market conditions. Understanding that historical data alone is insufficient to capture the complexity and unpredictability of financial markets, the adage “History Is Not Enough” aptly captures the need for additional manipulation of historical data to address this shortfall. Furthermore, existing data augmentation techniques have struggled to adapt effectively to financial time series. In addition, the workflow for applying synthetic data to downstream financial tasks has not been thoroughly explored. To tackle these research gaps, we propose a novel workflow that integrates augmentation with an adaptive curriculum to handle uncertainty in downstream tasks. Our approach includes a data manipulation module that utilises single-stock transformation, multi-stock mix-up, and data curation techniques to synthesise di- verse, high-quality financial data. The curriculum planner dynamically adjusts the manipulation of training samples based on the state of the data and the task model. Experimental results show that our plug-and-play workflow is both model-agnostic and task-independent, improving performance and mitigating the risk of suboptimal decision-making in dynamic market environments.