In-Orbit Artificial Intelligence and Machine Learning for Space Applications : Versal Space Reference Design -First Design-In Experiences

16<sup>th</sup> March 2023

Dr. Rajan Bedi CEO Spacechips rajan@spacechipsllc.com

The Global Space-Electronics Company



Winner of European Start-Up of 2017 and European High-Reliability Product of 2016, 17 & 18



















|                                        | DDR3 16Gb x 16 | NAND 256Gb x 16 |
|----------------------------------------|----------------|-----------------|
| Number of Devices                      | 64             | 4               |
| Minimum Real Estate (mm <sup>3</sup> ) | 152,320        | 16,796          |
| Power Dissipation (static/dynamic)     | 17W / 17W      | 13.2mW / 5.2W   |
| Approximate Cost (\$)                  | 650k           | 30k             |
| Storage Rate, 16-bit bus (Mbytes/s)    | 2,667          | 100             |
| Total Byte Writes (TBW)                | Unlimited      | 7530 TB         |







## Sensors & Sensor Fusion

- CMOS image sensors provide leading-edge, low-power performance in a small form factor outputting data on 28 sub-LVDS pairs with each capable of operating up to 960 Mbps.
- CCD image sensors output data using 1.6 Gbps, CML high-speed serial links and 1V8 LVCMOS SPI for control.
- LVDS SpaceWire, SpaceFibre, RS-232 and RS-422 are other interfaces typically used by LIDAR sensors.







# Space-Grade AI Accelerators : Tesla





- Magics Technologies' Tesla comprises a RISC-V microcontroller and an AI inference co-processor offering 64 MAC units operating up to 6.4 GOPS when clocked at 100 MHz.
- Tesla contains dedicated AI engines optimising the execution of linear algebra, convolution and neuralnetwork functions to support machine learning.
- Tesla AI Accelerator offers 32 LVCMOS I/O and contains 3 MB of memory.
- Low-power device, 1V8 and 3V3 supply rails, 84pin QFN package
- Roadmap of new SG AI Accelerators

## Space-Grade AI Accelerators : Qormino QLS1046



- Teledyne e2v : each Cortex-A72 core offers a performance of 4.72 DMIPS/MHz and with four cores running at 1.8 GHz, the resultant horsepower is 34,000 DMIPS or greater than 45,000 CoreMarks®.
- Each core contains a SIMD vector processing unit, NEON, processing at 56.6 GFLOPS at 1.8 GHz.
- The four MPUs execute the ARMv8-A architecture each with their own L1 32KB data and 48 KB instruction caches, as well as sharing a common 2 MB L2.
- LVCMOS I/O and 5 & 10 Gbps high-speed serial links
- Total power consumption ranges from 6.5 to 20 W dependant on clock frequency and I/O rate.

## Space-Grade AI Accelerators : NG-Ultra

|                                                    | oprocessor Sul                 | bsystem                                        |                |                      |  |  |
|----------------------------------------------------|--------------------------------|------------------------------------------------|----------------|----------------------|--|--|
|                                                    | D                              | ebug & Trace                                   |                |                      |  |  |
| SoC Services                                       | Pr                             | rocessing Unit                                 | External Memo  | Connectivity         |  |  |
| Multichannel DMA                                   | 4.014                          | 4040                                           | DDR2/3/4 w/ RS | DDR2/3/4 w/ RS SPI   |  |  |
| V&T Monitor                                        | ARM <sup>®</sup><br>Cortex™-R5 | ARM <sup>®</sup><br>2 Cortex <sup>™</sup> -R52 |                | SpaceWire            |  |  |
|                                                    |                                |                                                | FLASH          |                      |  |  |
| Clock & Reset                                      | ECC NEON                       |                                                |                |                      |  |  |
| Error Manager                                      | MPU FPU                        | MPU FPU                                        | On-chip Memo   | UART                 |  |  |
| Boot SpaceWire                                     | GIC                            | 010                                            | eRAM           | GPIOs                |  |  |
| GIC GIC                                            |                                |                                                |                |                      |  |  |
| CoreLink <sup>™</sup> NIC-400 Network Interconnect |                                |                                                |                |                      |  |  |
|                                                    |                                |                                                |                |                      |  |  |
| FPGA                                               | abric                          | High Speed Conne                               | ctivity        | General Connectivity |  |  |
|                                                    | DPRAMs                         |                                                |                |                      |  |  |
| CDc DC                                             |                                | (HSSL Com                                      | olex I/O       | GPIO                 |  |  |
| • 19x24 Mult.                                      | True Dual Port                 |                                                | 1 4 9 4        |                      |  |  |
|                                                    |                                | 12 Gbps     1.2V     SpaceFibre     SpW        | to 1.8V<br>PHY | • 1.8V to 3.3V       |  |  |
| <ul><li>19x24 Mult.</li><li>Preadder</li></ul>     | True Dual Port     48 Kb       | 12 Gbps     1.2V     SpaceFibre     SpW        |                | • 1.8V to 3.3V       |  |  |

- NanoXplore: each Cortex-R52 core offers a performance of 1,250 DMIPS/core running at 600 MHz.
- Each core contains a SIMD vector processing unit, NEON.

# Versal Space Reference Design



- AMD/Xilinx's Versal ACAP represents a timely and synergistic OBP engine to enable in-orbit AI and ML.
- Spacechips is bringing-to-market an EM Versal Space Reference Design (XCVC1902-1MSEVSVA2197) later this year to allow you to prototype and de-risk in-orbit AI and ML.
- Populated with EM-grade versions of space-grade parts!
- The XCVC1902 is part of the AI Core Series (133 TOPs) and contains 400 AI engines, 1,968 DSP engines, 1,968,400 logic cells 899,840 LUTs and 1,968k logic cells.
- Spacechips is bringing-to-orbit a flight-qualified version next year which you can launch to implement AI and ML in-orbit.
- Populated with space-qualified components, delivered with EICD, an Instruction Manual and functional HDL to prove operation of the signal-chain blocks no application code!

# Versal Space Reference Design Architecture



Representative transponder that allows RF traffic to be input, digitised, processed and intelligent analytics extracted in real-time.

Data can be stored using the 1 Tb, non-volatile on-board memory (DDR3-speed) and exported to external sub-systems using a variety of space-industry interfaces such as SpaceWire, SpaceFibre, SPI, CAN and 44, 32.75 Gbps HSSLs.

Blind and R/W Scrubbing options supported as well as access to SelectMAP port!

#### Direct RF Conversion



12-bit ADC, Fs = 1.5 GSPS, BW = 3 GHz to digitise up to S-band (Ku/K-band options can also be offered) 12-bit DAC, Fs = 8 GSPS, BW = 7.5 GHz (K-band options can also be offered)

Default build offers fixed sampling rates, options available to allow sampling rate to be changed and reprogrammed in-orbit

#### Implementation 1 : XCVC1902-1MSEVSVA2197



## **XPE Versal Power Distribution**

| PE Quick Estimate - XCVC1902VSVA2197-1LI                                                              | ×                                                                                                |
|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| XCVC1902VSVA2197-1LI                                                                                  |                                                                                                  |
| Processing System                                                                                     | AI Engine                                                                                        |
| Quantity     Clock (MHz)     Load (%)       Dual R5     0     500     100                             | Interface     Cores     Load (%)       PL Stream     200     100                                 |
| OCM 0                                                                                                 | NoC Stream  200 100                                                                              |
| TCM 0                                                                                                 |                                                                                                  |
| A72 0 500 100<br>Dual_GEM 200 100                                                                     | NoC     Bandwidth       Data Path     (MBps)       PS->DDRC     ▼                                |
| USB 200 200                                                                                           | PS->PL ▼                                                                                         |
| Programmable Logic       %     Clock (MHz)     Toggle       LUT     899840     100.0     300     12.5 | IO/Transceiver Interfaces       Input     Output     Inout     Mb/s       HDIO     0     0     0 |
| FF 1799680 + 100.0 300 12.5                                                                           | XPIO 0 0 0                                                                                       |
| BRAM 1934 100.0 300 12.5                                                                              | Memory DDR3  Data 64 1866                                                                        |
| URAM 463 • 100.0 300 12.5                                                                             | Channels Line Rate (Gbps)                                                                        |
| DSP 1968 100.0 300 12.5                                                                               | GTY 32 25                                                                                        |
|                                                                                                       |                                                                                                  |
|                                                                                                       | OK Cancel                                                                                        |

- When the XCVC1902-1MSEVSVA2197 is fully implemented, its 0.8 V core voltage will draw around 140 A with a total device dissipation of 130 W.
- 57% of the overall power is consumed by the AI engines
- 13% by logic
- 10% by the high-speed transceivers
- 10% by clocking and PLLs
- 5% by processors and the remainder by memory and interfaces

## Factorised Power Architecture





Power Density [W/in<sup>3</sup>] / Efficiency [%] / Current Density [A/in<sup>2</sup>]



Power Density [W/in<sup>3</sup>] / Efficiency [%] / Current Density [A/in<sup>2</sup>]

### Versal Power Distribution

| Domain/<br>Sequence no.            | Rail Name              | Rails                                      | Voltage                                                               | DC Spec. | AC Spec.                        | Current<br>(A) | Power<br>(W) | Step | Comment                                                                                                                                                  |
|------------------------------------|------------------------|--------------------------------------------|-----------------------------------------------------------------------|----------|---------------------------------|----------------|--------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| LPD/1<br>PL/1<br>PMC/1<br>System/1 | PS_IO (Digital)        | VCCO_500/1/2/3,<br>VCCO_HDIO,<br>VCCO_XPIO | 1.8V – 3.3V (HDIO/PSIO)<br>1.8V – 3.3V (VCCO_50X)<br>1V – 1.5V (XPIO) | ±1%      | ±5% (XPIO)<br>(HDIO/PSIO/XPIO)* | 0.100 - 3      | 10           | 100% | *1.8V, 2.5V at ±5%, and 3.3V at +3/–5%<br>VCCO supplies can be combined if<br>using same Voltage<br>VCCOs must be powered on first in<br>relevant domain |
| System/2                           | 0V80_SOC _IO (Digital) | VCC_SOC, VCC_IO                            | 0.8V                                                                  | ±1%      | ±17m∨                           | 3.5            | 3            | 33%  |                                                                                                                                                          |
| PMC/2                              | 0V80_PMC (Digital)     | VCC_PMC                                    | 0.8V                                                                  | ±1%      | ±17mV                           | 0.350          | 0.3          | 33%  | 0.88V for PS Overdrive                                                                                                                                   |
| System/3                           | 1V5_VCCAUX (Digital)   | VCCAUX                                     | 1.5V                                                                  | ±1%      | ±2%                             | 4.2            | 6.3          | 33%  |                                                                                                                                                          |
| LPD/2                              | 0V80_PSLP (Digital)    | VCC_PSLP                                   | 0.8V                                                                  | ±1%      | ±17mV                           | 0.300          | .2           | 33%  | 0.88V for PS Overdrive                                                                                                                                   |
| FPD/1                              | 0V80_PSFP (Digital)    | VCC_PSFP                                   | 0.8V                                                                  | ±1%      | ±17mV                           | 1.5            | 1.2          | 70%  | 0.88V for PS Overdrive                                                                                                                                   |
| PL/2                               | 0V80_RAM (Digital)     | VCCINT, VCC_RAM                            | 0.8V                                                                  | ±1%      | ±17m∨                           | 135            | 108          | 33%  | 200A/us Slew Rate                                                                                                                                        |
| PMC/3                              | 1V5 (Digital)          | VCCAUX_SMON, VCCAUX_PMC                    | 1.5V                                                                  | ±1%      | ±2%                             | 0.350          | .5           | 100% |                                                                                                                                                          |
| PL/3                               | 0V88 (Analog)          | GTAVCC                                     | V88.0                                                                 | ±2%      | 10m∨pp                          | 1.7            | 1.5          | 70%  | Ripple is steady state, total tolerance is +/-3%. Ripple at FPGA pins, see <u>UG578</u>                                                                  |
| PL/4                               | 1V5 (Analog)           | GTAVCCAUX                                  | 1.5V                                                                  | ±2%      | 10mVpp                          | 0.100          | .2           | 70%  | Ripple is steady state, total tolerance is +/-3%. Ripple at FPGA pins, see <u>UG578</u>                                                                  |
| PL/5                               | 1V2 (Analog)           | GTAVTT                                     | 1.2V                                                                  | ±2%      | 10m∨pp o                        | 2.8            | 3.3          | 70%  | Ripple is steady state, total tolerance is +/-3%. Ripple at FPGA pins, see <u>UG578</u>                                                                  |

# Conclusion

- For high-definition SAR video, the raw computing performance of the QLS1046-4GB together with its fast, memory interface and small form-factor makes it suitable for extracting real-time insights from Earth-Observation imaging data. DDR4 rates up to 2.1 GHz avoid traditional I/O bottlenecks.
- For situational awareness, e.g., for identification of friend or foe, for space-debris collision avoidance or in-situ, space exploration resource utilisation, FPGAs such as the KU060, PolarFire and NG-ULTRA are able to ingest and process Tbps of data from multiple sensors with low latency in real-time to deliver ASIC-class, system-level performance.
- For object classification, AI inference and autonomous decision making to enable feature identification or reconfigurable, cognitive transponders based on real-time traffic needs, Xilinx's ACAP would result in the most efficient vector-compute solution.
- Tesla and QLS1046-4GB will deliver in-orbit AI and ML at lower power dissipation and less financial cost, but I/O options and sensor fusion are limited!
- NG-ULTRA's quad R-52 cores will deliver in-orbit AI and ML at lower power dissipation and less financial cost, with FPGA fabric offering good sensor fusion.

# Conclusion

- Spacechips is bringing-to-market a range of smart OBCs and transponders which enable in-orbit AI and ML, baselining the Tesla, Qormino, NG-ULTRA and Versal AI accelerators for different space applications.
- Spacechips is bringing-to-market an EM Versal Space Reference Designs (XCVC1902-1MSEVSVA2197) later this year to allow you to prototype and de-risk in-orbit AI and ML (133 TOPs).
- Spacechips is bringing-to-orbit a space-qualified Versal Space Reference Designs next year to allow you to implement AI and ML in-orbit.
- Orders for the XCVC1902-1MSEVSVA2197 EM Versal Space Reference Design currently being taken EM-grade parts, includes an EICD, an Instruction Manual and functional HDL to prove operation of the signal-chain blocks – no application code!
- In 2024, Spacechips will bring-to-market lower-power EM and FM versions of the Versal Space Reference Design baselining the smaller XCVE2302-1MLISFVA784 ACAP (35W, 45 TOPs, 34 AI, 150k LUTs, 324 DSP, 329k logic cells)
- Feedback is there something on the Versal Space Reference Design which you would like to see?



#### Space Electronics

Space-Grade and COTS FPGAs for Space Applications

Space-Systems Engineering

How to Select & Use COTS Components

PCB Design for Space Applications

Testing Satellite Payloads

Mission Design, Frequency Planning & Link-Budget Analyses

In-Orbit AI and Machine Learning for On-Board Processing

Satellite Applications, Remote Sensing and Geospatial Processing Dr. Rajan Bedi CEO Spacechips rajan@spacechipsllc.com

http://www.out-of-this-world-design.info/ https://www.linkedin.com/in/drrajanbedi/

The Global Space-Electronics Company www.spacechipsllc.com



Winner of European Start-Up of 2017 and European High-Reliability Product of 2016, 17 & 18