16–19 Oct 2023
ESTEC
Europe/Paris timezone

CNN4NEOOD - CONVOLUTIONAL NEURAL NETWORK FOR NEAR EARTH OBJECT OBSERVATION AND DETECTION

17 Oct 2023, 16:20
20m
ESTEC

ESTEC

Keplerlaan 1, 2201 AZ Noordwijk, The Netherlands
end-of-life management End-of-Life Management & Zero Debris

Speakers

Ms Aiswarya Unni (Tr2 srls) mauro venanzi (UrbyetOrbit srls)

Description

** CNN4NEOOD Abstract Proposal for Clean Space Industry Days**

Application of Artificial Intelligence in space domains is gaining prominent interest due the increasing demand for services and in-orbit satellites number, with a consequent increasing number of proximity operations and the need to mitigate the risks posed by space debris, and non-cooperative targets. The goal is to provide a complete solution that integrates image-based Meta Reinforcement Learning to actively recognize and track non-cooperative spacecrafts and space debris. This signifies the potential to revolutionize object detection and tracking in space, safeguarding critical space assets and ensuring enhanced safety in future space missions.

CONVOLUTIONAL NEURAL NETWORK FOR NEAR EARTH OBJECT OBSERVATION AND DETECTION, CNN4NEOOD Proposal, aims to create a robust and flexible AI algorithm able to work in different lighting conditions, damaged targets, occlusions, and other effects that might interfere with the data quality in order to simulate an In-Space environment for detection. The algorithm will use Meta Reinforcement Learning based on how human beings learn to be trained by relating to past experiences via sequential observation, in particular with videos, and the method applied is active with humans in the loop. Convolutional Neural Networks (CNN) are used for processing the acquired images and extracting relevant features. Recurrent Neural Networks (RNN) are capable of mapping temporal relationships in a sequence of inputs to the output, this property makes RNN suitable to be used within the Meta RL framework.

Observation method

Equipped with low technology readiness camera sensors, optical or multispectral, the simulated environment uses datasets and videos available to train the algorithm. Stereo photoclinometry and Stereophotogrammetry are executed with mixed input sourcing, optical and multispectral, to obtain a 3D from imagery. Thus, 3D reconstruction techniques are used to enhance the understanding of the environment.

Principle applied

Partially Observable Markov Decision Process (POMDP)

Every Reinforcement learning problem is a Markov Decision Process, depending on whether it satisfies Markov’s property. In scenarios where the state cannot be directly observed or is affected by noise, the Markov Decision Process (MDP) transforms into a Partially Observable Markov Decision Process (POMDP). In a POMDP, the underlying state of the environment becomes hidden, and the agent receives observations through an observation function that provides indirect information about the underlying state.

In the context of image-based RL, the observations contain embedded information about the state, but the agent does not have access to the actual state. Instead, it relies on these observations to make decisions. For example, in space debris tracking using images, the agent (a satellite) receives images from its onboard camera, but the true state of the debris (its exact position and velocity) remains unknown. Deep reinforcement learning with recurrent neural networks RNNs, can be employed to tackle the challenges of partial observability and make effective decisions in such environments.

State: The system employs computer vision techniques to estimate the state of the tracked object, including its position, velocity, and possibly other relevant attributes. This estimated state information is fed into the PPO framework to make decisions.

Action: At each time step, the actor network selects an action based on the current state, and the tracking platform executes the chosen action to update its viewpoint or trajectory.

Reward Function: The reward function es the learning process effectively. The reward function is likely to encourage the tracking system to follow the object accurately, maintain proximity to the object, and penalize with a penalty weight for deviations from the object's predicted trajectory.

Meta Reinforcement Learning (Meta RL)

Traditional reinforcement learning, the agent interacts with the environment and collects data (observations, actions, and rewards) to update its policy during training. The agent learns a policy that maps states to actions in order to maximize cumulative rewards over time. CNN is used for feature extraction. In our case we also employ an RNN to capture temporal dependencies in the image data, which can be crucial for successfully positioning the current state.

An actor-critic framework, in which two deep neural networks run in parallel to update the policy. The actor network is responsible for selecting actions (adjusting camera angles or the trajectory) based on the input data and its internal state. The critic network evaluates the actions performance and provides feedback to the actor to improve decisions.

Training Methods

Meta-RL involves training an agent (deep neural network) to learn how to learn more effectively in a reinforcement learning setting. Meta-RL takes this one step further by introducing an additional level of learning. Instead of directly learning a policy for the object detection task, the agent learns how to adapt or learn new policies more efficiently and effectively when faced with new, unseen object detection tasks. This is achieved by exposing the agent to a set of object detection training tasks during a meta-training phase. During the meta-testing phase, the agent is evaluated on its ability to perform well on new, unseen object detection tasks that were not encountered during the meta-training phase. By leveraging meta-learning, the agent can become better at learning from limited data, which is particularly crucial in Space applications, where acquiring large amounts of labeled dataset is challenging.

Proximal Policy Optimization PPO is employed as the reinforcement learning algorithm which is the state-of-the-art algorithm designed to perform well on on-policy optimization problems where you do not have an explicit model.

Simulation environment will simulate image acquisition from onboard cameras. Visual data will be captured from sensors mounted on a moving platform, such as a satellite or robotic arm. The goal is to actively track these objects in real time to estimate their state and improve tracking accuracy. The concept of Extended Simultaneous Localization and Mapping (ESLAM) is applied. An agent estimates a point of reference based on his own position and returns a map of the environment around itself reconstructed, in this case, via visual in-space observation and with particular attention to incoming objects.

Conclusions

Our project proposal CNN4NEOOD holds significant importance in the space industry due to the increasing demand for space surveillance use of the growing number of in-orbit satellites leads to a higher risk for space debris and non-cooperative targets. This means considerable challenge for space missions, as accurate detection and tracking of objects are essential for ensuring the safety of spacecraft, and efficiency of space operations. Therefore, signifying the importance of robust AI algorithm capable of real-time object detection and tracking. By integrating image-based Meta RL, the algorithm will actively improve learning, recognition and track non-cooperative space objects, even under varying lighting conditions, and uncertainties such as unknown observations, as is often the case with space debris tracking.

Primary author

mauro venanzi (UrbyetOrbit srls)

Co-author

Ms Aiswarya Unni (Tr2 srls)

Presentation materials