#### Analysis and Mitigation of SEUs on SRAM-based FPGAs using the VERI-Place tool

Marco Desogus, <u>Luca Sterpone</u> Politecnico di Torino



#### Goals

- Estimate the impact of soft errors on modern SRAM-based FPGAs
- Critical configuration bits analysis
- Effective Validation is performed through fault injection
- Automatic SEU-aware implementation

## Outline

- Introduction
- Related Work
- The VERI-Place tool
- The designer flow
- Experimental Setup, Results and demo

#### Introduction

- Single Event Upset (SEU) might result in data corruption, transient disturbance, high current conditions
- Affects many types of devices and technologies
- SEU can, *if not handled well*, cause unwanted functional interrupts or in worst case catastrophic failures

#### Introduction

- Reconfigurable devices are very flexible and low cost and adaptable to several applications
- Hardening techniques are required in order to mitigate the probability of soft-errors
- Fast and accurate estimation of the impact of soft-errors is needed

#### **Related Work**

• In order to analyse the effects of SEU in FPGA devices three basic approaches are developed :

| Acceleration ground testing                                                                                       | Fault-injection                                                                                                    | Analytical approach                                                                                             |
|-------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| FPGA exposed to a flux of<br>radiation<br>Advantages:<br>• Correct estimation of<br>SEU sensitivity<br>Drawbacks: | Insert faults in the system<br>and monitoring the results<br>Advantages:<br>• No permanent<br>damages<br>• Cheaper | Based on synthesis tool and<br>software programs<br>Advantages:<br>• Reducing time<br>• No permanent<br>damages |
| <ul> <li>High cost</li> <li>Permanent damages</li> <li>of DUT</li> </ul>                                          | Drawbacks:<br>• Huge time is required                                                                              | <ul> <li>Low cost</li> <li>Drawbacks:</li> <li>High Effort in the development phases</li> </ul>                 |

## Analytical approach: The Xilinx Essential Bits

Configuration memory bit classification

"Xilinx essential bits technology uses an algorithm to identify which configuration bits are the essential bits.

If an essential bit is upset, it changes the design circuitry. However, the upset might not affect the function of the design"

[Xilinx Application Notes XAPP538]



X538\_01\_020412

## **Analytical Approach: VERI-Place**

Configuration memory bit classification

#### **VERI-Place** analysis

"The tool identifies a bit as critical if it affects a resource of the circuit and the effect can be propagated to the output of the circuit considering its redundancy and logic masking"



#### **VERI-Place core**

- VERI-Place consists of an algorithm executing the following steps:
  - loading the circuit description
  - performing the topological analysis considering all the possible configuration memory modifications and providing Error Rate
  - generating placement constraints oriented to the reliability













FPGA array

FPGA configuration memory

Soft Errors Accumulation





FPGA configuration memory

FPGA array

Soft Errors Accumulation





FPGA configuration memory



Soft Errors Accumulation





FPGA configuration memory

FPGA array

## **VERI-Place Error Classification**

- The accumulation is performed in the following steps
  - An SEU is selected from the SEU list
  - The circuit graph topology is modified according to the SEU type
- All the modifications correspondent to a single configuration memory bit are accumulated
- The final graph topology is evaluated
  - In case of structural modification: the effect is classified as an error

#### **VERI-Place New features**

- Overall logic and routing resource exposure
- Global sensitive bit report
- Critical sensitivity <u>Heat Map</u> for Programmable Interconnection Points (PIPs)
- Generation of the error rate figure on the basis of bit-flip accumulation ratio and expected Flip-Flop switching activity (*maximum* and *minimum*)

#### The VERI-Place tool is online

- VERI-Place software executable and User Manual are available online
- Example on B14 plain and X-TMR designs

#### www.cad.polito.it



## **VERI-Place: the designer flow**

- 1. Analysis of the critical bit sensitivity
- 2. Analysis of the Single Event Upset Assumption
- 3. Analysis of the SEU accumulation and breakeven point
- 4. Radiation-environment error rate prediction
- 5. Application of the mitigation rules

## **Critical bit sensitivity reports**

 VERI-Place reports the number of logic and routing resources potentially critical



## **Critical bit sensitivity reports**

VERI-Place reports the number of logic and routing resources potentially critical





## **Single Event Upset Assumption**

 A single SEU is analyzed considering logical masking and redundancy



#### **SEU accumulation**



SEUs

#### Scrub-rate breakeven point



SEUs

#### **Radiation-environment prediction**

- VERI-Place error rate is computed in two different ways:
  - Minimal switching activity
  - Maximal switching activity



## Mitigation

- The VERI-Place tool is able to execute
  - Re-placement
  - Re-packing
- The constraints are compliant with the Single Fault Assumption rule
- VERI-Place manages
  - Placement of CLB carry logic
  - Placement of CLB packed TMR
  - TMR LUT packing
  - RAM resource checking
  - UCF packing and area group files for Xilinx tools

#### **Experimental Demo Setup**

• DUT- SoC ARM Cortex-M0 DS on Xilinx Virtex-5



| Resource name     | ARM SoC<br>Plain |
|-------------------|------------------|
| Slice<br>Register | 962/28,800       |
| Slice<br>LUTs     | 3,563/28,800     |
| Memory<br>Blocks  | 16/7680          |

#### **Experimental Setup - Workload**

- Test program: Bubble sort algorithm
  - Senerate random vector: vector of 1,500 elements with numbers in the range 1  $\div$  100,000
  - Up-order: output through UART all the elements in up-order
  - Down-order: output through UART the vector in down-order

We consider an error when the report results stored in the host machine is different with a golden copy (no fault injected)

#### **Experimental Results**

• Exposed bits

| Resource name | ARM SoC Plain |
|---------------|---------------|
| Logic         | 19,536        |
| Routing       | 110,422       |

• Exposed bits logic resources data

| Resource name | ARM SoC Plain |
|---------------|---------------|
| Used RPM      | 1,328         |
| Not used RPM  | 2,992         |
| Total PIP     | 55,211        |

**RPM**: Routing Programmable Matrix

#### **Experimental Results**

Critical resources Heat map generated



• The Heat map provides a valid instrument to visually identify critical resources in the design under test

#### **Comparative analysis**

 Comparison of the Error Rate: Fault-injection & VERI-Place
 Error Rate comparison on ARM Soc Plain



## Mitigation – SEU Assumption

• VERI-Place applies the Reliability-oriented rules



#### Demo

- ARM XTMR on Virtex-5 and Artix-7
  - Fault injection execution
  - VERI-Place execution

#### Roadmap



#### Thank you!

#### **Spare slides**

## An example: original circuit

The bitstream

The original netlist





## An example: sensitive bit

The bitstream



• The corrupted netlist



## An example: sensitive bit

The bitstream



• The corrupted netlist



# An example: 2-bit accumulation

• The bitstream



• The corrupted netlist



#### **TMR vs Plain SEU sensitivity**



**SEUs**