Market-Adaptive Stock Trading through B-WEMA Driven Proximal Policy Optimization

Mulia Ichsan, Amalia Zahra

Building of Informatics, Technology and Science (BITS)

0.0 (0 ratings)

Introduction

Market-adaptive stock trading through b-wema driven proximal policy optimization. Market-adaptive stock trading with B-WEMA driven Proximal Policy Optimization (PPO). This deep reinforcement learning model ensures stable, risk-controlled returns, outperforming benchmarks with 23.43%.

2 views

Abstract

Developing automated trading strategies that achieve stable returns while controlling risk remains a central threat in quantitative finance. Many reinforcement learning-based trading systems focus on reward maximization but provide limited justification for the choice of forecasting indicators and often lack comprehensive benchmarking against alternative strategies and risk measures. This essay addresses the problem of integrating a statistically grounded price-smoothing technique with a policy optimization scheme to improve sequential trading decisions under market uncertainty. We propose a hybrid model that combines Brown’s Weighted Exponential Moving Average (B-WEMA) as a trend-sensitive forecasting indicator with a Deep Reinforcement Learning agent trained using Proximal Policy Optimization (PPO). The role of B-WEMA is to provide structured price signals that reduce noise sensitivity, while PPO determines buy and sell actions through policy updates constrained for stable learning. The performance of the proposed model is evaluated over a 10-month trading horizon and compared with a buy-and-hold benchmark and an alternative reinforcement learning method, Advantage Actor-Critic (A2C), both implemented under the same experimental conditions. Empirical results show that the proposed B-WEMA-PPO framework achieved a cumulative return of 23.43% over the test period, outperforming both the benchmark and the A2C-based agent. In addition to cumulative return, risk-adjusted performance metrics, namely volatility and maximum drawdown, are reported to provide a balanced assessment of profitability and risk exposure. These findings suggest that incorporating structured exponential smoothing into policy optimization may enhance the stability and effectiveness of reinforcement learning-based trading strategies.

Review

This paper presents a compelling approach to developing more robust and risk-aware automated stock trading strategies, addressing critical limitations often found in purely reinforcement learning (RL) based systems. The authors tackle the pertinent issue of achieving stable returns while effectively controlling risk, a central challenge in quantitative finance. Their core contribution lies in proposing a novel hybrid model that integrates Brown’s Weighted Exponential Moving Average (B-WEMA) with Proximal Policy Optimization (PPO). This conceptual framework is appealing as it aims to provide a statistically grounded method for generating trend-sensitive price signals, thereby reducing noise sensitivity, while leveraging the stable learning capabilities of PPO for sequential decision-making. The motivation behind combining a traditional smoothing technique with advanced RL is well-articulated, promising enhanced stability and effectiveness. Methodologically, the paper outlines a clear and thoughtful experimental design. The choice of B-WEMA for providing structured price signals is a strategic move to mitigate the inherent noise in financial time series, which can often destabilize RL agents. PPO, known for its constrained policy updates, is an appropriate choice for promoting stable learning in a volatile market environment. The evaluation strategy includes a 10-month trading horizon, which, while not extensively long-term, serves as a solid basis for initial assessment. Crucially, the model is benchmarked not only against a naive buy-and-hold strategy but also against an alternative deep reinforcement learning agent, Advantage Actor-Critic (A2C), under identical experimental conditions. This direct comparison against a contemporary RL method strengthens the validity of the proposed framework, especially given the inclusion of risk-adjusted performance metrics alongside cumulative returns. The empirical results reported are promising, demonstrating that the B-WEMA-PPO framework achieved a notable 23.43% cumulative return over the test period, significantly outperforming both the buy-and-hold benchmark and the A2C-based agent. The inclusion of volatility and maximum drawdown as risk-adjusted performance metrics provides a balanced perspective, underscoring the potential for both profitability and risk control. These findings strongly suggest that integrating structured exponential smoothing techniques like B-WEMA into policy optimization schemes can indeed enhance the stability and effectiveness of RL-based trading strategies. While the 10-month horizon warrants further validation over longer and more diverse market conditions, this study provides a valuable step forward in developing more sophisticated and reliable automated trading systems.

Full Text

You need to be logged in to view the full text and Download file of this article - Market-Adaptive Stock Trading through B-WEMA Driven Proximal Policy Optimization from Building of Informatics, Technology and Science (BITS) .

Comments

You need to be logged in to post a comment.

Top Blogs by Rating

Favorite Blog

Market-Adaptive Stock Trading through B-WEMA Driven Proximal Policy Optimization

Home Research Details

Mulia Ichsan, Amalia Zahra