Speakers
Description
One of the main challenges that satellites face is the progressive accumulation of debris in LEO. Hence, the necessity to develop new strategies for debris removal, as well as for servicing and refuelling existing satellites to increase their lifespan.
This article proposes an implementation of a Deep Reinforcement Learning (DRL) framework to optimize the path of a chaser satellite, tasked with retrieving space debris or servicing other spacecrafts. The experiments have been conducted in a simulated environment and in the presence of multiple space debris.
The proposed approach addresses imperfect environmental modelling and measurements by using a Partially Observable Markov Decision Process (POMDP). It replaces hidden state information with a belief function derived from the observation history, which is processed by a Long Short-Term Memory (LSTM) to create a fixed-length sequence. This sequence is then weighted by a Transformer encoder to capture the non-linear dynamics of the signals. The resulting semantic history is used by an agent employing Proximal Policy Optimization (PPO), an online direct policy estimation method. PPO relies on two neural networks: a critic for value estimation and an actor for policy evaluation, implemented as either Multi-Layer Perceptrons (MLPs) or 1D-Convolutional Neural Networks (CNNs) to leverage temporal information.
The model considers the motion of the satellite and debris in LEO, under J2 and atmospheric drag effects. The reward function has been designed to achieve rendezvous with the debris, minimum fuel consumption and manoeuvre duration, and optimal relative velocity.
Case studies, making use of available debris tracking data, are presented to demonstrate the efficacy of Transformer-based DRL in improving the precision, efficiency, and safety of ADR and IOS missions. The article concludes with a discussion on the future potential of DRL in advancing autonomous space operations and ensuring long-term space sustainability.