9–11 Apr 2018
European Space Research and Technology Centre (ESTEC)
Europe/Amsterdam timezone
PLEASE READ ME: public presentations (made available by the presenters) posted on website - for the presentations not available and/or password protected, a public version was not made available by the presenters.

Permanent Fault Handling in SRAM-based FPGAs

11 Apr 2018, 11:40
20m
Newton 1 and 2 (European Space Research and Technology Centre (ESTEC))

Newton 1 and 2

European Space Research and Technology Centre (ESTEC)

Keplerlaan 1 2201AZ Noordwijk ZH The Netherlands

Speaker

Mr Florian Rittner (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Description

Permanent faults are a critical issue when using SRAM-based FPGAs in space applications. Compared to tem- porary effects such as Single-Event Upsets (SEUs), a system restart by performing an FPGA reset or a power cycle does not recover these faults. Usually, the occurrence of permanent faults has a low probability but is highly critical as it might lead to a system outage. In harsh environments, like space, a physical access to the FPGA is not possible, which restricts repair or debugging possibilities. To overcome permanent faults and an accompanying system outage, we introduce a concept for permanent fault handling based on Wireless Remote Debugging (WRD) via an RF link. The major benefit of this concept is to ensure a continuous FPGA operation for the mission life time. The fault sources when using FPGAs in space applications are radiation, thermal stress, mechanical stress, and aging effects. Radiation is the dominating share and consists in SRAM-based FPGAs mainly by Single-Event Latch-up (SEL) and Total Ioniz- ing Dose (TID). However, a permanent fault only becomes critical in the case of visibility, whereby we do not classify between the different fault sources as they lead to a same behavior. Our proposed concept is based on a Fault Detection, Isolation, and Recovery (FDIR) approach, which is consists of a coarse fault detection, for example with an Built-In Self-Test (BIST) design for all FPGA primitives, a fine-grain fault isolation for the defective primitive type, and finally an exclusion strategy with the fault recovery. The communication with the FPGA requires a WRD possibility. Automated test procedures supports the FDIR process by automated execution of the test process or also isolation and recovery cycles, which also minimizes the user interaction. For the test process itself, different primitive test concepts from the state of the art can be used. To consider user preferences and basic conditions, we developed an evaluation concept, which can be parameterized by the user and results in a score indicator. We show that permanent fault handling can increase the re- liability of FPGA-based system as they are fully functional after a permanent fault affects the FPGA. Different tools were implemented to reduce the user interaction. Our evaluation concept is able to find the best solution for a given application. Generally, two main application field are conceivable. First, systems with high overall costs and low fault probabilities (e.g. a longterm GEO mission) demand such permanent fault handling to ensure a high reliability. Systems with low costs but a high fault probability form the second application field (e.g. a CubeSat). Here, our approach is used for cyclic tests and repairs. Other application field are covered by state of the a solutions, for example a radiation-hardened device.

Primary author

Mr Florian Rittner (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Presentation materials