Speaker
Mr
Florian Rittner
(Friedrich-Alexander-Universität Erlangen-Nürnberg)
Description
Permanent faults are a critical issue when using SRAM-based FPGAs in space applications. Compared to tem- porary effects such as Single-Event Upsets (SEUs), a system restart by performing an FPGA reset or a power cycle does not recover these faults. Usually, the occurrence of permanent faults has a low probability but is highly critical as it might lead to a system outage. In harsh environments, like space, a physical access to the FPGA is not possible, which restricts repair or debugging possibilities. To overcome permanent faults and an accompanying system outage, we introduce a concept for permanent fault handling based on Wireless Remote Debugging (WRD) via an RF link. The major benefit of this concept is to ensure a continuous FPGA operation for the mission life time. The fault sources when using FPGAs in space applications are radiation, thermal stress, mechanical stress, and aging effects. Radiation is the dominating share and consists in SRAM-based FPGAs mainly by Single-Event Latch-up (SEL) and Total Ioniz- ing Dose (TID). However, a permanent fault only becomes critical in the case of visibility, whereby we do not classify between the different fault sources as they lead to a same behavior. Our proposed concept is based on a Fault Detection, Isolation, and Recovery (FDIR) approach, which is consists of a coarse fault detection, for example with an Built-In Self-Test (BIST) design for all FPGA primitives, a fine-grain fault isolation for the defective primitive type, and finally an exclusion strategy with the fault recovery. The communication with the FPGA requires a WRD possibility. Automated test procedures supports the FDIR process by automated execution of the test process or also isolation and recovery cycles, which also minimizes the user interaction. For the test process itself, different primitive test concepts from the state of the art can be used. To consider user preferences and basic conditions, we developed an evaluation concept, which can be parameterized by the user and results in a score indicator. We show that permanent fault handling can increase the re- liability of FPGA-based system as they are fully functional after a permanent fault affects the FPGA. Different tools were implemented to reduce the user interaction. Our evaluation concept is able to find the best solution for a given application. Generally, two main application field are conceivable. First, systems with high overall costs and low fault probabilities (e.g. a longterm GEO mission) demand such permanent fault handling to ensure a high reliability. Systems with low costs but a high fault probability form the second application field (e.g. a CubeSat). Here, our approach is used for cyclic tests and repairs. Other application field are covered by state of the a solutions, for example a radiation-hardened device.
Primary author
Mr
Florian Rittner
(Friedrich-Alexander-Universität Erlangen-Nürnberg)