Main article

Yan Luo*
School of Intelligent Logistics and Supply Chain, Sichuan Vocational and Technical College, Suining 629099, Sichuan, China
yan_yy2026@hotmail.com
Mingchun Qi
School of Intelligent Logistics and Supply Chain, Sichuan Vocational and Technical College, Suining 629099, Sichuan, China
Mingshan Cheng
School of Intelligent Logistics and Supply Chain, Sichuan Vocational and Technical College, Suining 629099, Sichuan, China

Abstract

Capital allocation in modern financial markets is complicated by non-stationary return dynamics, abrupt regime shifts, asymmetric tail risk, and path-dependent transaction costs that traditional mean–variance optimisation is poorly equipped to address. This study develops and empirically evaluates a data-driven reinforcement learning framework for risk-aware portfolio optimisation that couples an asynchronous advantage actor–critic engine with a clipped proximal policy update. The proposed Dynamic Actor–Critic with Clipped Proximal Policy Optimisation (DAC-CPPO) agent is trained on two publicly accessible datasets obtained from Kaggle: the Intelligent Finance Assets dataset (7,000 records, 32 features) and the Massive Yahoo Finance dataset (approximately 603,000 records, 9 base features). Prior to learning, Z-score normalisation cleans and rescales the raw features, after which Linear Discriminant Analysis compresses redundant technical indicators into a compact, class-separable state vector. A Sharpe-based reward function encodes the risk–return trade-off directly into the policy gradient, while the clipping mechanism bounds probability-ratio updates to suppress destabilising swings in portfolio weights. Across a 250-trading-day back-test, DAC-CPPO attains a Sharpe ratio of 1.91, a cumulative return of 1.12, a realised volatility of 0.14, and classification accuracy of 97.6% with an MAE of 0.074 and RMSE of 0.081, materially outperforming mean–variance, tree-based, sequence-based, and baseline reinforcement learning benchmarks. An ablation study isolates the marginal contribution of each architectural element, and a sensitivity analysis on the clipping threshold identifies an interior optimum that reconciles policy stability with responsiveness to regime change. The findings provide both theoretical and practical guidance for deploying deep reinforcement learning in data-driven capital allocation pipelines subject to realistic market frictions.

Article details

How to Cite

Luo, Y., Qi, M., & Cheng, M. (2024). Data-Driven Capital Allocation in Financial Markets: Evidence from Reinforcement Learning and Risk-Aware Portfolio Optimization. Journal of Business and Data Analytics, 2(3), 1-27. https://doi.org/10.63646/jbda.2024.020301