AI-based Anomaly Detection and Prognosis for P/F intelligence and autonomy

18 Nov 2021, 16:35
25m
Let's Get Digital (Virtual)

Let's Get Digital

Virtual

Speaker

Mr Filippo Ales (ADS-FHN)

Description

The Failure Detection, Isolation and Recovery (FDIR) subsystem is a critical function on board all spacecraft since it is vital for ensuring the safety, autonomy and availability of the system during the mission lifetime. Together with software hard coded and hardware protection mechanisms, the majority of modern satellites implement as well a PUS-based FDIR design. The latter mechanism uses the concept of parameter and functional monitoring based on dedicated unit level TM and is used to determine the correct functioning of the individual unit as well as subsystem level monitoring to ensure the correct functioning of mode specific tasks (i.e. applied for AOCS subsystem). The limitations of PUS-based FDIR are linked to the limited amount / type of checks that can be performed on the parameters and to the implementation itself; in order to work, the anomalies and their signatures need to be know at service definition in order to make the necessary parameters observable and to properly set the monitoring checks liked to them.

The use of machine learning and/or deep learning algorithms can significantly enhance the performance of the on-board FDIR, especially in identifying and isolating failures at the lowest level possible (equipment level) thus fostering mission availability and autonomy. Indeed, artificial intelligence algorithms have the capability to identify non-nominal on-board behaviour without classical limits, but based on past telemetry signatures and trends (e.g. orbital conditions, telemetry patterns etc.) or through interrelationships between telemetry parameters. An implementation of AI algorithms purely in software and integrated in the On-board Software of a classical LEON based On-Board Computer (OBC) is feasible, however the complexity of the model is limited by the processor (i.e. AI-based FDIR would increase CPU load). For this reason, this type of solution would not scale well for an AI algorithm looking at hundreds, potentially thousands of TM/TC parameters.

This work deals with the development of a fully-fledged solution of an Anomaly Detection and Anomaly Prognosis (ADAP) system implemented as a hardware unit in the programmable logic of a SoC based OBC or in the FPGA co-processor for a classical OBC. The Anomaly Detection Module will take advantage of anomaly detection algorithm(s) in a purely unsupervised manner. Hence, without a need on knowing a priory all the potential failures modes, an ML-based FDIR can capture anomalous behaviours or failures even when only small symptoms are present, identifying and isolating failures at the lowest level possible (equipment level) thus fostering mission availability and autonomy. The Anomaly Prognosis Module is instead trained on historical telemetry data in order to cover the other use-case for on-board FDIR applications; identify specific anomaly signature(s) that relate to a (specific) observed failure (for example, due to design errors) and apply a targeted recovery action which otherwise would require a complex software patch.

The ADAP system workflow which is presented in this study is based in combining the most common Machine-Learning workflow provided in the literature with the classical workflow of developing a satellites' failure tolerance technique, required by the specified target of missions' reliability, availability, maintainability and operational autonomy requirements. The objective is to present a solution which does not jeopardize the mission whatever the failure, but that it is also sufficiently generalized to be deployed to satellite constellations without the need of specific tailoring.

Presentation materials