**EDHPC 2023** 

European Data Handling & Data Processing Conference for Space 2 - 6 October 2023 | Juan-Les-Pins | France



# Single Event Effects

• esa

Basic Mechanisms and Testing of Complex Devices

**DEFENCE AND SPACE** 

Indranil Chatterjee Airbus eXpert – Semiconductor Devices October 2, 2023



# **Export Control Information**

#### Section 1 (not applicable in France, please go to section 3)

This document contains Technical Information : Yes X No

If No to section1: please complete Section 2 If Yes to section1: please complete Section 3 as applicable

Section 2 (not applicable in France, please go to section 3)

I confirm the document does not contain Technical Information and is « Not-Technical »

Name:

Date:

#### Section 3

France

3a. National and EU regulations Export Control Assessment

This document has been assessed against applicable export control regulations in

X Germany Spain I UK I Other: [Specify the country]

X and does not contain Controlled Technology<sup>1</sup> and is therefore « Not Listed / Not Controlled »

and contains Controlled Technology with export control classification [Insert classification number, e.g ML22x, xExxx, AMAx]

Note: Any transfer of this document in part or in whole must be made in accordance with the appropriate export control regulations. Prior to any transfer outside

of the responsible legal entity, confirmation of an applicable export licence or authorisation must be obtained from the local Export Control Officer (ECO).

#### 3b. US (ITAR/EAR) Export Control Assessment

| X | This document does not a | contains US origin  | Technical Data | (Technology) |
|---|--------------------------|---------------------|----------------|--------------|
| • |                          | Jonitalina OS Ungin | Technical Data | (Technology) |

This document contain « Technology » which is controlled by the U.S government under [USML category number / ECCN] and which has been

received by [Legal entity] under the authority of [Licence number / ITAR exemption / EAR licence exception / NLR]

This document contains technology which is designated as EAR99 (subject to EAR and not listed on the USML/CCL.)

Note: Any re-export or re-transfer of this document in part or in whole must be made in accordance with the appropriate regulation (ITAR or EAR) and applicable

authorization. If in any doubt please contact your local ECO.

October 2, 2023

#### **3c. Technical Rater Information**

This document has been assessed by the following Technical Rater :

Assessed and classified by:

Date classification completed:

<sup>1</sup> "Controlled Technology" is defined as any Information necessary for the design, development, production, use, operation, maintenance or repair of export controlled goods. Examples of such Information are blueprints, plans, diagrams, models, engineering designs, manuals, requirements specifications and instructions etc. If in any doubt please contact your local Export Control Officer (ECO)

AIRRU

# Content

- Radiation environment & effects
- Single-Event Effects
  - Basic Mechanisms
  - Impact of technology scaling
  - Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions



### In Moore we trust ...



 Technology scaling driven by process, device innovations; has and will keep Moore's law alive for several more generations.

 SC Focus: Advanced bulk planar and FinFET technologies for FPGAs and ASICs.



# Outline

#### • Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions

5

**AIRBUS** 

### Space Environment: A Cosmic Particle Hotpot

- Earth's atmosphere is continuously bombarded by cosmic particles, solar cosmic rays
- Magnetic field shields us from most of these particles
- These particles can disrupt space and ground based electronics
- Satellites, space probes, servers, routers, supercomputers are commonly affected by radiation faults







After E. G. Stassinopoulos and J. P. Raymond, Proc. of the IEEE 76, 1988



#### **Terrestrial Radiation Environment**

- Neutrons (up to 100s MeV)
  - Produced when cosmic rays collide with air \_ molecules
  - Sea-level flux at NYC estimated to be  $\sim 13$ neutron/cm<sup>2</sup>/hr
- Alpha particles
  - From impurities in packaging material
  - Major alpha emitters: <sup>232</sup>Th and <sup>238</sup>U decay
- Thermal Neutrons (<1 MeV)
  - Through <sup>10</sup>B interaction
- Emerging sources
  - Muons (from cosmic rays)



After B. Narasimham. IEEE NSREC Short Course 2021

Substrate

or "Soft"



#### How does radiation affect electronics

"Zombiesat" threatens Arctic telecommunications Northwestel eyes out-of-control Galaxy 15 satellite

#### NUNATSIAQ NEWS

Northwestel is keeping an anxious eye on the skies as the satellite it uses to serve Nunavut passes near another satellite.

TRENDING: Skywatching Guide // Space Webcasts // Mars Rover Curiosity // Solar Flares // Space

Space Radiation Doomed Russian Mars Probe



COM

That Crashed: Reports

SPACE.com Staff | January 31, 2012 11:11am ET

#### Biomicore

Nathaniel Richards, Lead Technical Correspondent.

#### Cosmic Rays, the heart of Cisco's Router Problems

Cisco Systems issued a fil series router line cards. Th card resets resulting from 1 reoccurring or transient, ar Cards are showing memor application-specific integra which may have resulted in three minute recovery.

TECH SPACEFLIGHT SCIENCE & ASTRONOM



EDTH Network

**Backstories** Is Your Smartphone Threatened... By the Cosmos? MJapan ( #Biz / Tech ) Monday April 23, 2018

#### **How Space Weather Can Influence Elections on** Earth

The real alien voter fraud is genuinely coming from outside our planet.

By Becky Ferreira

February 17, 2017, 6:05pm 🖪 Share 😏 Tweet 🛔 Snap

# **EE** Times

designlines AUTOMOTIVE

**Design How-To** 

#### Cosmic rays damage automotive electronics

Another rewrite for 737 Max software as cosmic bit-flipping tests glitch out systems – report Third time's a charm?

A Gareth Corfield

Eri 2 Aug 2019 15:54 LITC

CHUCK SOUATRIGLIA GEAR MAR 29, 2010 8:00 AM

#### **Toyota's Recall Woes May Have Started in Space**

Toyota's problem with unintended acceleration has been blamed on everything from the position of the floor mats to the shape of the accelerator pedal to glitches in the cars' software. There may be another



#### SMART NEWS

Solar Storm Knocks 40 SpaceX Satellites Out of Orbit

A solar outburst is increasing atmospheric drag and pulling the satellites back down to Earth

Corryn Wetzel Daily Correspondent February 14, 2022





CISCO

IMES.com

TOP OF THE NEWS

hard network problems equipment is growing increasingly IIIIII to soft errors - nonrecoverable,

#### **Radiation Effects in Microelectronics**

#### • Total ionizing dose (TID) effects

- -Accumulation of ionizing dose deposition over a long time.
- -Causes slow gradual degradation of the device's performance
- Displacement damage (DD)
  - Accumulation of crystal lattice defects caused by high energy radiation.
  - Primarily induced by protons and electrons
  - -Opto-electronic components and CCD are particularly affected

#### • Single event effects (SEE)

 A high ionizing dose deposition, from a single high energy particle, occurring in a sensitive region of the device.



#### EEP HOLE TRAPPING IEAR THE SI/SIO, INTERFACI SiO, INTERFACE TRAPS RESULTING FROM INTERACTION OF HOLES ELECTRON-HOLE PAIRS GENERATED BY IONIZING RADIATION 10PPING TRANSPORT OF HOLES THROUGH LOCALIZED STATES IN SiO<sub>2</sub> BULK EXITING PARTICLE DEFEC 0/0/0 0 0 00/0 0 0 0 0 0 0 0 0 0 0 0 0 Interstitial C Vacancy Dopant or Impurity Atom Heavy Ion ("Cosmic Ray")





# Content

Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions

**AIRBUS** 

# Single Event Effects

• Soft errors

- Correctable by reprogramming the circuit into its correct logic state
- If error rate is too high, it can cause system degradation and potentially mission failure
- Arise when a heavy-ion or proton deposits sufficient energy to change the state of a circuit node

• Hard errors

- Are created when a heavy ion deposits sufficient energy to cause permanent damage to a device
- -Error cannot be corrected by reprogramming
- Types of hard errors include latchup, snapback, singleevent burnout, and single-event gate rupture



After P. Roche, IEEE NSREC Short Course, Paris, 2014.



#### Some Definitions

- LET: Linear Energy Transfer measure of energy deposited by an ionizing particle per unit distance; Expressed in MeVcm<sup>2</sup>/mg
- Cross-Section expresses the likelihood of an error due to SEE; Units: area (cm<sup>2</sup>)

$$Cross - section = \frac{Error Count}{Fluence} \quad \frac{1000 \ errors}{10^6 \ particles/cm^2} = 10^{-3} cm^2$$
Formula
Example

- Soft error rate (SER) rate at which device/circuit encounters upsets; Expressed in FITs (failure-in- time)
  - 1 FIT = 1 failure in  $10^9$  bit hours (or 1 ppm per 1000 hrs)
- Formula = Cross-section × Particle flux
  - FIT rate:  $10^{-3}$  cm<sup>2</sup>/device ×  $10^{-3}$  a/cm<sup>2</sup>hr ×  $10^{9}$  dev.hrs = 1000 FIT



AIRBUS



**Airbus Amber** 

### **Fundamental Response**

#### Charge Generation

- Incident ion interacts with material to produce free charge carriers (electrons and holes)

#### Charge Recombination and Collection

- Electrons and holes move by diffusion and drift through the material (oxides and semiconductors) to a sensitive node while they also recombine

#### Circuit Response

- The additional charge on the node alters the voltage that ultimately leads to single event effects. Voltage glitches may propagate through a circuit







c.) Diffusion

charge collection

#### Multiple Node Charge Collection





- 20 nm
- Smaller the device geometry, worse the impact



After I. Chatterjee, RADECS 2021 Short Course



# **Transient Propagation in Logic Circuits**



#### **AIRBUS**

### Soft Errors in Combinational Logic





#### **Airbus Amber**

# Content

Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions



# CMOS Scaling & Soft Error Trends

- Servers in 2025 Exascale computing challenges
  - 2-3 nm technology

10 EFlop/s

1 EFlop/s

100 PFlop/s 10 PFlop/s

1 PFlop/s

100 TFlop/s

10 TFlop/s

1 TFlop/s

100 GFlop/s

10 GFlop/s

1 GFlop/s

100 MFlop/s

1990

1995

2000

2005

Year

2010

Performance

- 100 billion transistors per chip

Performance of #1 supercomputer

Performance of #500 supercomputer

Combined performance of top 500 supercomputers



- With scaling, the number of SRAM bits per IC increases, resulting in an increasing trend for FIT/IC
- Logic SER will exceed latch SER because of device scaling and higher clock rates



2015

#### **Scaling Trends – Memories**



- Scaling generally results in decreasing error/bit for memories and latches
- Multiple-bit upsets provide the biggest contribution to event rates for all LETs.



#### Scaling Trends – Combinational Logic





• Logic error not as dominant as expected, however, a higher operating frequency may increase with scaling resulting in higher contribution from logic

• With stagnating CPU frequencies, logic errors are unlikely to dominate chip error rate.

#### AIRBUS

#### System Level Scaling Trends



- Each system → built with 1000s of IC
- Each IC  $\rightarrow$  billions of transistors
- Unhardened IC: 10,000 FIT/IC → 10 million
   FIT/system → system MTBF is ~1 soft error every
   4 days!

- With increasing packing density, similar number of upsets at the IC level across technology nodes.
- With the increased system-level complexities the system-level error rates continues increase with each new generation



# Content

• Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions

**AIRBUS** 

# **FPGA Design Blocks**

- Configuration
- User fabric:
  - DFFs (sequential memory ... holds logic)
  - Combinational logic (computation logic ... no hold)
  - Global routes: clocks and resets (connect to DFFs ... controls hold)
  - Embedded memory (random access use of addressing for access)
- Hidden logic:
  - Analog
  - Combinations of DFFs, clocks, resets, combinational logic
  - Hard-wired specialized cores (highspeed SERDES, processors, AI blocks, etc...)
- All blocks have unique SEE susceptibilities



After M. Berg, SELSE 2016



# SEU/SET versus System Failures (SEF)

- System failures are design dependent (topology)
- An occurrence of an SEU or SET does not definitively cause a system upset.
- SEF is a probability of a SEU or SET causing operation to go wrong.
- Upper bounding methods assume all SEUs will cause a SEF (generally ignores uncaptured SETs).
- Clock and reset trees (global routes) are susceptible to SETs.
- Clock trees in ASICs and FPGAs are the most overlooked mechanism of failure due to ionization.
- Global route susceptibilities should be considered when determining system risk.
- Global route susceptibilities are different for each FPGA device.



#### FPGA SEU Characterization Data and Extrapolation

- FPGA error rates are (user) design dependent.
- Error rates are derived from  $\sigma$  data... but  $\sigma$  data are not design specific.
- How do we extrapolate data for mission-specific characterization?
- Goal: predict an error rate for a target FPGA user-design.
- For older FPGA generations we generally use bounding techniques:
  - Upper bound techniques are derived from SEE testing (studying trends and identifying dominant mechanisms of failure).
  - Error rates are extrapolated from the dominant mechanisms of failure and their utilization within the target user-design.
- However, upper bound calculations might not meet requirements:
  - More testing is required (test-as-you-fly)
  - Mitigation might be required

Easy with rad-hard FPGA devices Not easy with commercial FPGAs!

After M. Berg, SELSE 2016

AIRR

# SRAM-Based FPGA Configuration Implementation and SEU Susceptibility





### Concept of CRAM driven unavailability assessment of FPGAs



#### **AIRBUS**

#### New Generation SoC ... New Challenges for SEE

- Significant amount of embedded circuitry (hidden logic)
- Hidden circuits are extremely complex and require complex test methods.
- Increased focus on  $\sigma$ \_HiddenLogic

 $\sigma_{SEF} = f(\sigma_{configuration}, \sigma_{BRAM}, \sigma_{functionalLogic}, \sigma_{HiddenLogic})$ 





After M. Berg, SSQ 2023



# Content

• Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA

#### • SEE Testing of Complex Components

- SEE Test Standards
- Sample preparation
- A typical test campaign
- SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions



# Single-Event Testing

- Why test?
  - -To determine the presence and characteristics of single events
    - Destructive or non-destructive
    - Voltage and temperature dependence
    - Amplitude and width of SETs
  - To calculate the SEE rate for a radiation environment
- SEE testing is usually done at accelerator facilities, which irradiate the whole device with ions some in air and some in vacuum.
- Component packages must be opened, de-processed, thinned...
- Other testing methods that provide spatial and temporal information include:
  - -Focused, collimated ion beam
  - -Focused, pulsed laser beam

#### **Airbus Amber**

### **SEE Test Guidelines**

- Test guideline documents that define SEE testing of microelectronic devices and circuits (last update):
  - ASTM F1192 (2018)
  - ESCC Basic Specification No. 25100 (10/2002; Reaffirmed 10/2014)
  - JEDS57 (11/2017)
  - JESD89 (10/2007; Reaffirmed 01/2012)
  - JESD234 (10/2013)
  - MIL-STD-750, Test Method 1080 (01/2012)
- Do a fairly good job of defining procedures for heavy ion testing HOWEVER...
  - The SEE landscape is dynamic. New types of SEE signatures are observed on complex COTS components





# Understanding the DUT

- Understand device process technology and application conditions –
  - SEE testing is always applicationspecific
  - –What sort of impacts might SEE have on a device?
  - Could the device under test be susceptible to destructive effects?





# **Choosing a Test Facility**

- Identify a suitable test facility and consider systematic variables
  - Is ion range or dE/dx (ionization/length) more important?
  - Can the component package be opened, thinned? If not, choose a high penetration ion beam.
  - What's the sensitive area(s) geometry and are there any hardening techniques (design and/or process) employed?
- Ion selection, pulsed laser sources, energy range, flux range, dosimetry, beam profile and purity, and accelerator technology





### **Device Preparation**

- Ion penetration range is short compared to packaging materials
  - -Cannot use protons for everything
- What is the package type and die material?
   Are there heat sinks?
- Thinning and polishing for backside irradiation is not trivial
- Methods: mechanical, chemical, and electromagnetic (ablation lasers)
- As with any commercial technology, destructive effects are always a concern



M. R. Shaneyfelt, et al., SEE Symposium, 2011.





K. LaBel, et al., SEE Symposium, 2011.



# A Typical Test Campaign

- Most of the time before, during, and after a SEE test is spent
  - 1. Deciding what you want to measure and how;
  - Verifying you can do 1.; and,
  - 3. Figuring out what you actually got.
- Because SEE testing is realtime, many aspects are dynamic, so contingency planning is essential
- Always have a backup plan





## Variability of Radiation Performance of COTS





• Manufacturer process changes also affect SEE sensitivity. For COTS, traceability of procured devices remains mandatory.



Beer-Lambert law (first order)

Energy loss per unit length

 $E(z) = E_0 \cdot e^{-\alpha z}$ 

 $\frac{1}{dz} = -\alpha E_0 e^{-\alpha z}$ 

### SEE Tests with Laser : Mechanisms



- Photoelectric effect Band to Band optical absorption if E<sub>ph</sub> > E<sub>gap</sub>
- Ionization track
  - Track radius limited by diffraction laws (~ 1  $\mu$ m for 1.06  $\mu$ m)
  - Range in Silicon : function of the selected wavelength (>700 μm for 1.06 μm)
- Both ions and laser (with an appropriate wavelength) can interact with silicon and generate localized charges
- Different particle interaction mechanism but the consequence is the same → localized charge generation



### SEE Tests with Laser

- Specific consideration for laser testing
  - Metal over-layers testing through backside required
  - For backside irradiation device preparation requires mirror-like surface
  - Laser focused within cell sensitive volume
  - High doping levels area can contribute to free carrier (FC) absorption which reduces the available energy at sensitive volume. Thinning process can be required.

n-laver

Drain





(After S. Morand, RADHARD 2021)



#### SEE Tests with Laser

- Extremely useful tool for pre-selection of devices
- Characterize several applications conditions in order to :
  - Reduce the number of Heavy ions test  $\rightarrow$  cost reduction
  - Advice on (almost) real time designers on the best way forward to minimize the SEE impact → schedule optimization
- Optimize the Occurrence Rate prediction by having a better knowledge of the device structure
- Map the sensitive function (ex : FPGA)
- Better Identify and test mitigation solutions: check the mitigation solutions efficiency at Board or System level
- Investigate in case of problem occurred during Heavy lons test



After S. Morand et al., IEEE TNS, Jun 2021

# Content

• Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions



#### System-level Radiation Testing

- SEE testing is expensive. For example, SEL testing of a complex FPGA can be as high as \$100k. Systematic tests of all possible SEE effects can cost close to half a million.
- Alternative test regime: system-level SEE tests





#### System-level SEE Testing

- Key constraints: Only high penetration beams, and beam field homogeneity needed
- Verification of mitigations/Assess need for additional mitigations
  - Using laser or proton beam, it is possible to evaluate the efficacy of mitigation strategies applied.
  - Example: Power DC/DC Converter board : the goal was to check the efficiency of the mitigation solutions implemented on a given design using a PWM from TI. Cross checking of mitigation efficiency on the main output voltages when applying a laser beam "in-situ" on EM board
- Proton a good estimator for soft errors, very poor for destructive events
  - It provides only a quite large upper bound to failure rate prediction
    - Untested board upper bound 0.1 failure/board-day
    - Fluence of 10<sup>10</sup> p/cm<sup>2</sup> 0.01 failure/board-day
    - Fluence of 10<sup>11</sup> p/cm<sup>2</sup> 0.003 failure/board-day



After S. Guertin, NASA Handbook 2017

AIRBUS

# **Issues with System-level SEE Testing**

- It is a pass/fail test
  - No well-defined mitigation strategy if the outcome is a 'fail'
- Lack of observability
  - Difficult to understand what went wrong
- Data portability
  - Data collected is strongly design and application dependent, cannot be reused for other designs
- Limited level of confidence because the lack of information at various stages



# Content

• Radiation environment & effects

#### • Single-Event Effects

- Basic Mechanisms
- Impact of technology scaling
- Complex devices, e.g. FPGA
- SEE Testing of Complex Components
  - SEE Test Standards
  - Sample preparation
  - A typical test campaign
  - SEE testing with laser
- System-level SEE Testing as an alternative
- Conclusions



AIRRI

### Conclusion

- The effects of SEE on semiconductor devices can be of two types destructive (SEL. SEB, SEGR, etc) and nondestructive (SET, SEU, SEFI, etc.)
- Scaling generally results in decreasing error/bit for memories and latches, but with the increased system-level complexities the system-level error rates continues increase with each new generation
- With complex devices, such as FPGAs, evaluating soft error performance is complicated hidden logic is expected to be a major contributor for next-generation FPGAs
- System level management of soft errors deploy redundancy, hardware and software mitigation strategies
- Laser testing extremely useful for pre-selection of devices or understanding failure modes
- System level testing will become more popular as a valid and cost-efficient tool to perform RHA for those systems used in high-risk acceptance space missions. In other cases, it can be used to validate SEE mitigation strategies.

# Thank you

I. Chatterjee, EDHPC 2023 Tutorial - Soft Errors | © Copyright Airbus Defence and Space GmbH 2023

This document and all information contained herein is the sole property of Airbus. No intellectual property rights are granted by the delivery of this document or the disclosure of its content. This document shall not be reproduced or disclosed to a third party without the expressed written consent of Airbus. This document and its content shall not be used for any purpose other than that for which it is supplied. Airbus, it's logo and product names are registered trademarks.

