MDP-MAPPO: Multi-Drone Path Planning with Multi-Agent Proximal Policy Optimization for Digital Twin-Assisted Vehicular Edge Computing
Main article
Abstract
The rapid proliferation of latency-sensitive vehicular applications---autonomous driving perception, cooperative collision avoidance, real-time traffic management, and infotainment streaming---is creating task offloading demands that exceed the capacity of fixed roadside infrastructure in spatially heterogeneous traffic environments. Unmanned Aerial Vehicle (UAV)-assisted mobile edge computing has been proposed to supplement roadside unit (RSU) infrastructure with flexible, on-demand computing capacity, but the joint optimization of multi-UAV trajectories and task offloading ratios under dynamic vehicular channel conditions remains computationally intractable for centralized optimization approaches. This paper proposes MDP-MAPPO, a Multi-Drone Path Planning algorithm based on Multi-Agent Proximal Policy Optimization that addresses this challenge through a digital twin-assisted multi-agent reinforcement learning framework. The system architecture integrates three innovations: (1) a digital twin edge server that maintains real-time virtual replicas of the vehicular network topology, channel states, and task queues, providing the MAPPO agents with accurate state information for coordinated decision-making; (2) a cooperative MAPPO framework where each UAV agent is trained centrally using a shared critic that estimates system-level value while executing policies decentrally based on local observations; and (3) a joint optimization objective that simultaneously minimizes system latency and energy consumption through coordinated trajectory planning and offloading ratio adaptation. Simulation results demonstrate that MDP-MAPPO achieves 112.8 ms mean task latency and 19.6 J energy consumption, representing improvements of 43.2% and 46.8% respectively over MADDPG baselines, and 60.3% and 59.8% over single-agent PPO. The digital twin state prediction achieves 94.1% accuracy for channel modeling and 96.8% positional accuracy, substantially outperforming no-DT baseline accuracy (78.6% and 81.3%). Scalability analysis demonstrates consistent performance improvements as UAV count scales from 1 to 4 drones, with diminishing returns thereafter due to inter-UAV coordination overhead.
