MDP-MAPPO: Multi-Drone Path Planning with Multi-Agent Proximal Policy Optimization for Digital Twin-Assisted Vehicular Edge Computing

Tariq  Mahmood; Syed Ali  Hassan; Adnan  Akhunzada; Zubair  Ahmad

doi:10.63646/jiic.2025.050201

Published 2025-06-30

Tariq Mahmood*

Department of Electrical Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan
tariqmahmood@comsats.edu.pk

Syed Ali Hassan

Department of Electrical Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan

Adnan Akhunzada

School of Information Technology, Deakin University, Melbourne VIC 3125, Australia

Zubair Ahmad

Department of Electrical Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan

Abstract

The rapid proliferation of latency-sensitive vehicular applications---autonomous driving perception, cooperative collision avoidance, real-time traffic management, and infotainment streaming---is creating task offloading demands that exceed the capacity of fixed roadside infrastructure in spatially heterogeneous traffic environments. Unmanned Aerial Vehicle (UAV)-assisted mobile edge computing has been proposed to supplement roadside unit (RSU) infrastructure with flexible, on-demand computing capacity, but the joint optimization of multi-UAV trajectories and task offloading ratios under dynamic vehicular channel conditions remains computationally intractable for centralized optimization approaches. This paper proposes MDP-MAPPO, a Multi-Drone Path Planning algorithm based on Multi-Agent Proximal Policy Optimization that addresses this challenge through a digital twin-assisted multi-agent reinforcement learning framework. The system architecture integrates three innovations: (1) a digital twin edge server that maintains real-time virtual replicas of the vehicular network topology, channel states, and task queues, providing the MAPPO agents with accurate state information for coordinated decision-making; (2) a cooperative MAPPO framework where each UAV agent is trained centrally using a shared critic that estimates system-level value while executing policies decentrally based on local observations; and (3) a joint optimization objective that simultaneously minimizes system latency and energy consumption through coordinated trajectory planning and offloading ratio adaptation. Simulation results demonstrate that MDP-MAPPO achieves 112.8 ms mean task latency and 19.6 J energy consumption, representing improvements of 43.2% and 46.8% respectively over MADDPG baselines, and 60.3% and 59.8% over single-agent PPO. The digital twin state prediction achieves 94.1% accuracy for channel modeling and 96.8% positional accuracy, substantially outperforming no-DT baseline accuracy (78.6% and 81.3%). Scalability analysis demonstrates consistent performance improvements as UAV count scales from 1 to 4 drones, with diminishing returns thereafter due to inter-UAV coordination overhead.

Keywords: reinforcement learning; vehicular edge computing; multi-agent; UAV path planning; digital twin; MAPPO; task offloading

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Mahmood, T. ., Hassan, S. A. ., Akhunzada, A. ., & Ahmad, . Z. . (2025). MDP-MAPPO: Multi-Drone Path Planning with Multi-Agent Proximal Policy Optimization for Digital Twin-Assisted Vehicular Edge Computing. Journal of Intelligent Industrial Convergence, 5(2), 1-10. https://doi.org/10.63646/jiic.2025.050201

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite