Data-Driven Capital Allocation in Financial Markets: Evidence from Reinforcement Learning and Risk-Aware Portfolio Optimization
Main article
Abstract
Capital allocation in modern financial markets is complicated by non-stationary return dynamics, abrupt regime shifts, asymmetric tail risk, and path-dependent transaction costs that traditional mean–variance optimisation is poorly equipped to address. This study develops and empirically evaluates a data-driven reinforcement learning framework for risk-aware portfolio optimisation that couples an asynchronous advantage actor–critic engine with a clipped proximal policy update. The proposed Dynamic Actor–Critic with Clipped Proximal Policy Optimisation (DAC-CPPO) agent is trained on two publicly accessible datasets obtained from Kaggle: the Intelligent Finance Assets dataset (7,000 records, 32 features) and the Massive Yahoo Finance dataset (approximately 603,000 records, 9 base features). Prior to learning, Z-score normalisation cleans and rescales the raw features, after which Linear Discriminant Analysis compresses redundant technical indicators into a compact, class-separable state vector. A Sharpe-based reward function encodes the risk–return trade-off directly into the policy gradient, while the clipping mechanism bounds probability-ratio updates to suppress destabilising swings in portfolio weights. Across a 250-trading-day back-test, DAC-CPPO attains a Sharpe ratio of 1.91, a cumulative return of 1.12, a realised volatility of 0.14, and classification accuracy of 97.6% with an MAE of 0.074 and RMSE of 0.081, materially outperforming mean–variance, tree-based, sequence-based, and baseline reinforcement learning benchmarks. An ablation study isolates the marginal contribution of each architectural element, and a sensitivity analysis on the clipping threshold identifies an interior optimum that reconciles policy stability with responsiveness to regime change. The findings provide both theoretical and practical guidance for deploying deep reinforcement learning in data-driven capital allocation pipelines subject to realistic market frictions.
