Abstract
Autonomous aerial vehicles (AAVs) with wireless power transfer (WPT) technology offer a promising solution to extend the lifetime of wireless rechargeable sensor networks (WRSNs) by swiftly recharging multiple sensor nodes. Existing strategies often prioritize optimizing the AAV trajectory and charging schedule. While overlooking diverse energy needs and spatial distribution of sensor nodes, many approaches rely on positioning the AAV at the center of the region above a fixed height. This also leads to potential energy wastage by unnecessarily recharging already adequately powered nodes near the AAV. A key challenge lies in optimizing the 3-D position of the AAV to maximize energy harvested by sensor nodes, considering their diverse energy demands and spatial distribution. To tackle this challenge, we propose a novel approach that jointly optimizes the 3-D deployment of the AAV and energy harvested by sensor nodes. We formulate this complex, nonconvex problem as a Markov decision process (MDP) and leverage deep reinforcement learning based on the soft actor-critic (SAC) algorithm to address it. The AAV learns to navigate continuous state and action spaces, enabling it to discover optimal hovering locations. Our novel reward function allows the AAV to prioritize energy-critical sensor nodes and regions with higher sensor node density, maximizing energy harvesting within limited hovering times. Simulation results demonstrate that our proposed method significantly outperforms conventional AAV deployment strategies, substantially improving sensor node energy levels and network sustainability.