



### EXPRO+ ESA AO/1-8032/14/NL/AK CCSDS Lossless Compression IP-CORE Space Applications

### Final Presentation Days 8<sup>th</sup> May 2017

08/05/2017

TRP-AO8032 Final Presentation Days





Introduction and motivation

CCSDS lossless compressors overview

SHyLoC SystemC Model

SHyLoC VHDL Description and Verification

Technology Mapping to FPGA and ASIC

Conclusions



# Why do we need on-board compression?



- While the resolution of the remote sensors, and consequently the data rates continue to increase, the available downlink bandwidth is comparatively stable.
- The solution offered is to apply compression **on-board the satellites**  $\rightarrow$  payload data processors have to be able to accomplish this task.
- Lossless compression allows for reducing the data volume without compromising the data integrity (the image can be fully recovered after decompression).



# Standard Algorithms of the CCSDS

ULPGC

 CCSDS algorithms (Consultative Committee for Space Data Systems)



### **CCSDS 121**

 Universal lossless based on Rice codes.

### CCSDS 122

Lossless or lossy
2D compressor
based on DWT

### **CCSDS 123**

08/05/2017

**UK SPACE** 

 Multi/hyperspectral compressor based on prediction.



4

# Standard Algorithms of the CCSDS

- ULPGC
- CCSDS algorithms (Consultative Committee for Space Data Systems)



# Motivation



### The challenge

 Efficient hypespectral image compression onboard the available hardware (maximum reduction of data with minimum requirements of on-board resources)

### The goal

- Develop low-complexity high-throughput hardware architectures.
- Efficient implementation on space-qualified FPGAs and ASICs.

- Software
  - CPU
  - DSP
  - GPU

(not space-qualified)

- Hardware
  - ASIC
  - FPGA









# **TRP Project: Objectives**





European Space Agency

**TRP activity** 

CCSDS Lossless Compression IP-CORE Space Applications (SHyLoC)



- Main objectives
  - Model and implement of two lossless compression IP cores.
  - **Described** in SystemC and VHDL.
  - Compliant with the CCSDS 121 and CCSDS123 standards, including all configuration modes.
  - To be part of ESA's IP core's Repository.
  - Compatible with technologies:
    - One-time programmable FPGAs (Microsemi);
    - Reconfigurable FPGAs (Virtex5);
    - ASIC (DARE standard cell library)



# Summarized Workplan



| WP1 | IP Core Definition (IUMA)                                         |
|-----|-------------------------------------------------------------------|
| WP2 | System IP Core Implementation and Design Space Exploration (IUMA) |
| WP3 | VHDL IP Core Implementation (IUMA)                                |
| WP4 | IP Core validation and deployment (TELETEL)                       |
| WP5 | Technology mapping (TELETEL & TASE)                               |

- The consortium for this Project is formed by:
  - The Institute for Applied Microelectronics • (IUMA) from the University of Las Palmas de Gran Canaria (ULPGC), Spain.
  - **TELETEL SA, Greece.** ٠
  - Thales Alenia Space Spain (TASE) •

UNIVERSIDAD DE LAS PALMAS DE GRAN CANARIA Instituto Universitario de Microelectrónica Aplicada



08/05/2017



TRP-AO8032 PDR Review

ULPGC





Introduction and motivation

CCSDS lossless compressors overview

SHyLoC SystemC Model

SHyLoC VHDL Description and Verification

Technology Mapping to FPGA and ASIC

Conclusions



# CCSDS Lossless compressors overview





- Prediction-based using neighboring samples in the same band and in *P* previous bands (local sum and local differences).
- The prediction is computed from the dot product  $(\hat{d})$ between the local differences vector (U) and a weight vector (W)  $\hat{d} = W_{z,y,x}^T \cdot U_{z,y,x}$
- The weight vector is updated according to the prediction error.
- Prediction residuals are mapped and then encoded.

11

ULPGC

# CCSDS Lossless compressors overview

![](_page_10_Picture_1.jpeg)

![](_page_10_Figure_2.jpeg)

- Entropy coding:
  - Sample-adaptive
  - Block-adaptive (CCSDS-121)

![](_page_10_Figure_6.jpeg)

#### TRP-AO8032 Final Presentation Days

# CCSDS Lossless compressors overview

![](_page_11_Picture_1.jpeg)

![](_page_11_Figure_2.jpeg)

- Entropy coding:
  - Sample-adaptive
  - Block-adaptive (CCSDS-121)

#### Block-adaptive entropy coding

![](_page_11_Figure_7.jpeg)

# CCSDS 121 Standard

- Block-adaptive encoder:
  - For a block of J samples, the coder evaluates the option that yields the shortest codeword.
  - J is a configurable value (8, 16, 32, 64)
  - Basic code: FS codeword

![](_page_12_Figure_5.jpeg)

![](_page_12_Picture_6.jpeg)

![](_page_12_Figure_7.jpeg)

# SHyLoC Concept - Requirements

![](_page_13_Figure_1.jpeg)

- The IP cores can be combined into a logical single entity.
- CCSDs-121 IP can work independently as well as block-adaptive coder of the CCSDS-123 IP.
- Compression parameters selectable at runtime.
- Capable of accepting samples in any of the possible arrangements: band-sequential (BSQ), band-interleaved per pixel (BIP) or bandinterleaved per line (BIL).

15

# SHyLoC Concept - Requirements

- Throughput up to 1 Gbps (~60 Msamples/s for 16-bit input) when implemented on a Virtex5 FX130.
- Include AMBA AHB interfaces.
- Compatible with GRLIB and LEON2-FT.

![](_page_14_Figure_4.jpeg)

ULPGC

![](_page_15_Picture_0.jpeg)

![](_page_15_Picture_1.jpeg)

Introduction and motivation

CCSDS lossless compressors overview

SHyLoC SystemC Model

SHyLoC VHDL Description and Verification

Technology Mapping to FPGA and ASIC

Conclusions

![](_page_15_Picture_8.jpeg)

### CCSDS123 and CCSDS121 SystemC Models

![](_page_16_Picture_1.jpeg)

#### AMBA AHB SLAVE CCSDS123 CCSDS-121 Hyperspectral AHB SLAVE control data CONFIG123 - LOCALSUM NTERFACE Mapped OPCODE LOCALDIFF PREDICTOR MAP prediction Control residuals SAMPLE-UPDATED OR ADAPTIVE WEIGHTS Clk\_S Rst N 01011101 01101010 PACKER Compressed file

CCSDS123 SystemC Model

CCSDS121 SystemC Model

![](_page_16_Figure_4.jpeg)

- Models have the same interfaces and behaviour as the VHDL counterpart:
  - Compatible I/O interfaces enable plug and play connectivity between the IP cores/SystemC models.
  - AHB interfaces (TLM).
  - Configuration at compile-time and run-time.
- Exploration allowed to identify:
  - Relationships between configuration parameters and hardware complexity.
  - Data dependencies limiting throughput.
  - Potential need of storage element external to the FPGA.

# SystemC Modelling findings

![](_page_17_Picture_1.jpeg)

- Findings of the CCSDS-123 models:
  - Complexity depends on image size, P and compression order.
  - Need for different architectures for each compression order.
  - Optimize design for baseline P value (P = 3).
  - High throughput in BIP; lower in BSQ and BIL.
  - Need for external memory.
- Findings of the CCSDS-121 model
  - Influence of J in storage and latency.
  - High throughput.

![](_page_17_Figure_11.jpeg)

![](_page_18_Picture_0.jpeg)

![](_page_18_Picture_1.jpeg)

Introduction and motivation

CCSDS lossless compressors overview

SHyLoC SystemC Model

SHyLoC VHDL Description and Verification

Technology Mapping to FPGA and ASIC

Conclusions

![](_page_18_Picture_8.jpeg)

# SHyLoC VHDL description

- Perform lossless on-board data compression according to the CCSDS 121 and CCSDS 123 standard algorithms.
- Separate VHDL IP cores that can work independently, or be connected together.

![](_page_19_Figure_3.jpeg)

![](_page_19_Figure_4.jpeg)

- CCSDS 123 IP core:
  - High-performance lossless compression of multispectral and hyperspectral data.
  - Supports BSQ, BIP and BIL sample order.
  - Can be used as external pre-processor (predictor) for the CCSDS 121 IP core.

- CCSDS 121 IP core:
  - Universal lossless compressor based on Rice's coding.
  - Can be used as external entropy coder for the CCSDS 123 IP.

![](_page_19_Picture_12.jpeg)

ULPGC

# CCSDS123 and CCSDS121 Interfaces

![](_page_20_Figure_1.jpeg)

![](_page_20_Picture_2.jpeg)

### CCSDS123 and CCSDS121 Configuration

![](_page_21_Picture_1.jpeg)

- Configuration at compile-time: VHDL generic constants.
- Configuration at runtime: memory-mapped registers.
- Runtime configuration might be enabled or disabled:
  - When disabled: compile-time generic constants determine the configuration.
  - When enabled: compile-time generic constants determine the boundaries of the runtime configuration values.
- Example: configuration of number of bands for prediction, P
  - Runtime configuration disabled: constant P\_MAX used to set the parameter. AHB slave interface is not used.
  - Runtime configuration enabled: constant P\_MAX determines the range of allowed runtime configuration values for *P* [0 to P\_MAX].

![](_page_21_Picture_10.jpeg)

# CCSDS123 and CCSDS121 Interfaces

![](_page_22_Figure_1.jpeg)

![](_page_22_Picture_2.jpeg)

# CCSDS123 and CCSDS121

![](_page_23_Picture_1.jpeg)

![](_page_23_Figure_2.jpeg)

## CCSDS121 – Architectural description

Simplified schematic (ccsds121\_shyloc\_comp)

![](_page_24_Figure_2.jpeg)

ULPGC

# CCSDS123 and CCSDS121 Interfaces

![](_page_25_Figure_1.jpeg)

# CCSDS123 - Design considerations

![](_page_26_Picture_1.jpeg)

- Compression order and image dimesions:
  - Different achievable throughput:
    - BIP  $\rightarrow$  allows for parallelization of prediction operations of a sample in all bands.
    - BSQ  $\rightarrow$  prediction finished before starting the compression of samples in the same band.
    - BIL  $\rightarrow$  mixed situation.
  - Different storage requirements depending on compression order, image size and P.

![](_page_26_Figure_8.jpeg)

## CCSDS 123 – Architectural description (top)

![](_page_27_Picture_1.jpeg)

![](_page_27_Figure_2.jpeg)

TRP-AO8032 Final Presentation Days

08/05/2017

# CCSDS123 and CCSDS121

![](_page_28_Picture_1.jpeg)

![](_page_28_Figure_2.jpeg)

# CCSDS123 VHDL description

- Different architectures, one for each prediction order: BIP, BSQ, BIL.
- Basic predictor block diagram:

![](_page_29_Figure_3.jpeg)

**IUMA** 

ULPGC

# CCSDS123 IP BIP/ BIP-MEM architectures

![](_page_30_Figure_1.jpeg)

![](_page_30_Figure_2.jpeg)

- Parallel structures for dot product computation and weight update.
- Weight vector internally stored. We store one vector per band.
- BIP-mem: top\_right FIFO in external memory.

NOTE: t = x + y \* Ny

![](_page_30_Figure_7.jpeg)

![](_page_30_Figure_8.jpeg)

# CCSDS123 IP BIP/ BIP-MEM architectures

![](_page_31_Picture_1.jpeg)

![](_page_31_Figure_2.jpeg)

![](_page_31_Figure_3.jpeg)

![](_page_31_Figure_4.jpeg)

- Parallel structures for dot product computation and weight update.
- Weight vector internally stored. We store one vector per band.
- BIP-mem: top\_right FIFO in external memory.

NOTE: t = x + y \* Ny

# CCSDS123 IP BSQ architecture

![](_page_32_Picture_1.jpeg)

![](_page_32_Figure_2.jpeg)

- Local differences vector stored outside FPGA.
- Dot product and weight update performed serially in order to reduce complexity.

NOTE: t = x + y \* Ny

## CCSDS123 IP BIL architecture

![](_page_33_Picture_1.jpeg)

![](_page_33_Figure_2.jpeg)

- Dot product and weight update as in BIP.
- Additional internal storage for local differences vector.
- Specific scheduling.

*NOTE:* t = x + y \* Ny

![](_page_33_Figure_7.jpeg)

# SHyLOC – IP Database

ULPGC

- The VHDL sources are in a Git repository.
- A makefile is provided to configure, simulate or synthesize the IP cores.
- Configurations options are set using \*.csv files.
- A Python script generates the necessary VHDL files to configure the IP core and testbench.

![](_page_34_Figure_6.jpeg)

08/05/2017

# CCSDS 121 - Testbench

Reference software implementations: CCSDS 123.0-B-1 software developed by ESA. Emporda software from UAB.

![](_page_35_Figure_2.jpeg)

TRP-AO8032 Final Presentation Days

08/05/2017

# CCSDS 123 - Testbench

 Reference software implementations: CCSDS 123.0-B-1 software developed by ESA. Emporda software from UAB.

![](_page_36_Figure_2.jpeg)

**TRP-AO8032** Final Presentation Days

08/05/2017

# SHyLoC- Verification Plan

![](_page_37_Picture_1.jpeg)

- Verification dataset combines:
  - 35 images with different number of samples, dynamic range, and sample distribution.
  - 10 sets of compile-time configuration parameters
  - 10 sets of run-time configuration parameters
- Executed tests:
  - Basic Sanity (BS)  $\rightarrow$  36 tests
  - Intentional  $\rightarrow$  15 tests
  - Stress  $\rightarrow$  46 tests
  - Additional tests to improve coverage  $\rightarrow$  22
- Total: 121 tests.
- Simulations are performed with QuestaSim, and automated with scripts.
- A pass/fail simulation report is generated.

# SHyLoC– Validation

![](_page_38_Picture_1.jpeg)

- Demonstrator with CCSDS 123 + CCSDS 121 IPs.
- Compression Core Board: An FPGA based board with SpW interfaces. Based on PLDA's XpressV6:
  - Xilinx Virtex-6 LX240T.
  - Up to 2x4GB DDR2 SDRAM
  - Extension connector with up to 168 signals
  - PCIe core with multi-channel DMA host interface
- EGSE: An EGSE based on the iSAFT Simulator connected to the Compression Core board through a SpW link which will validate the core's functionality, by stimulating the board and retrieving the results through RMAP transactions over a SpW link.

![](_page_38_Picture_9.jpeg)

# SHyLoC– Validation

![](_page_39_Picture_1.jpeg)

### Demonstrator with CCSDS 123 + CCSDS 121 IPs.

![](_page_39_Figure_3.jpeg)

08/05/2017

# SHyLoC – Validation

![](_page_40_Picture_1.jpeg)

- Testcases  $\rightarrow$  4 sets of compile-time paramters
  - BIP-MEM (01\_BS\_Val)  $\rightarrow$  (Up to Nx = 8192; Ny = 8192; Nz = 2048)
  - BIP (02\_BS\_Val)  $\rightarrow$  (Up to Nx = 1024; Ny = 512; Nz = 256)
  - BSQ (03\_BS\_Val)  $\rightarrow$  (Up to Nx = 1024; Ny = 1024; Nz = 1024)
  - BIL (04\_BS\_Val)  $\rightarrow$  (Up to Nx = 512; Ny = 512; Nz = 256)
- In all tests, the CCSDS121 is configured as external entropy coder
- In total: 17 validation tests.
- Results of performance tests:
  - Running 02\_Val test:
    - BIP architecture
    - AVIRIS image (Nx = 677; Ny = 512; Nz = 224; 16 bits per sample)
    - Run-time configuration enabled
  - AHB frequency: 125 MHz; Core frequency: 62.5 MHz
  - Total from reception of first valid input sample until end of compression: 77987924 cycles
  - Throughput ~ 1 Gbps

![](_page_41_Picture_0.jpeg)

![](_page_41_Picture_1.jpeg)

Introduction and motivation

CCSDS lossless compressors overview

SHyLoC SystemC Model

SHyLoC VHDL Description and Verification

Technology Mapping to FPGA and ASIC

Conclusions

![](_page_41_Picture_8.jpeg)

# SHyLoC – Technology mapping

![](_page_42_Picture_1.jpeg)

- Different configurations representative of use cases have been synthesized.
  - CCSDS121: 3 sets with compile-time configuration only; 1 set with run-time configuration.
  - CCSDS123: 8 sets with compile-time configuration only; 4 set with run-time configuration.
- Synthesis has been performed for the following technologies:
  - FPGA:
    - Xilinx Virtex 5 & 5QR
    - Microsemi ProASIC3E, ProASIC3L, RTAX2000, RTAX4000 and RTG4
  - ASIC DARE 180 nm

![](_page_42_Picture_10.jpeg)

# SHyLoC – CCSDS121 mapping

- Synthesis results for Virtex5 FX130; RTG4 150 and DARE 180nm
- Baseline encoder configuration with:
  - Block size J = 16
  - Dynamic range of input samples, D = 16
  - Bit width of output buffer W\_BUFFER = 32.

| Virtex5 FX130   | Usage |    |  |
|-----------------|-------|----|--|
| BRAM            | 0     | 0% |  |
| DSP48           | 7     | 2% |  |
| LUT             | 3495  | 4% |  |
| Est. Freq (MHz) | 118   |    |  |
| Msamples/second | 118   |    |  |

| RTG4 150        | Usage |    |  |
|-----------------|-------|----|--|
| MACC            | 8     | 2% |  |
| RAM64x18_RT     | 11    | 5% |  |
| LUTs*           | 6418  | 4% |  |
| Est. Freq (MHz) | 42    |    |  |
| Msamples/second | 42    |    |  |

| DARE 180nm   | Usage |
|--------------|-------|
| Area (mm^2)  | 2.985 |
| Cells (kilo) | 55.97 |

![](_page_43_Picture_9.jpeg)

08/05/2017

# SHyLoC – CCSDS123 mapping

- Compile-time configurations for specific images and runtime configurable.
- Synthesis performed for Virtex5 FX130 and RTG4 150.
- Baseline predictor configuration with:
  - Number of bands for prediction, P = 3
  - Weight component resolution W = 13
  - Neighbor oriented and full prediction.
  - Sample-adaptive encoder is always implemented.
  - Bit width of output buffer W\_BUFFER = 32.

| IMAGE          | Nx   | Ny   | Nz   | bpp |
|----------------|------|------|------|-----|
| LANDSAT        | 1024 | 1024 | 6    | 8   |
| AVIRIS         | 512  | 680  | 224  | 16  |
| AIRS           | 90   | 135  | 1501 | 14  |
| RUNTIME CONFIG | 512  | 1024 | 256  | 16  |

# Synthesis on Virtex5 FX130

![](_page_45_Picture_1.jpeg)

#### **Resource usage VS Image size VS Predictor Architecture**

![](_page_45_Figure_3.jpeg)

![](_page_45_Figure_4.jpeg)

![](_page_45_Figure_5.jpeg)

#### **AVIRIS**

# Synthesis on Virtex5 FX130

![](_page_46_Picture_1.jpeg)

#### Frequency and throughput VS Image size VS Predictor Architecture

![](_page_46_Figure_3.jpeg)

# Synthesis on RTG4 150

![](_page_47_Picture_1.jpeg)

#### **Resource usage VS Image size VS Predictor Architecture**

#### LANDSAT

![](_page_47_Figure_4.jpeg)

![](_page_47_Figure_5.jpeg)

![](_page_47_Figure_6.jpeg)

#### AVIRIS

![](_page_47_Figure_8.jpeg)

#### RUNTIME CONFIGURABLE

![](_page_47_Figure_10.jpeg)

08/05/2017

# Synthesis on RTG4 150

![](_page_48_Picture_1.jpeg)

#### Frequency and throughput VS Image size VS Predictor Architecture

![](_page_48_Figure_3.jpeg)

# DARE 180 nm

![](_page_49_Picture_1.jpeg)

![](_page_49_Figure_2.jpeg)

#### Total area and gates 350,00 200,00 × X 180,00 300,00 X X 160,00 × X X 250,00 × 140,00 × X 120,00 200,00 × х 100,00 150,00 80,00 60.00 100,00 40,00 50,00 20,00 0,00 0,00 **BIP-MEM** BSQ BIP BSQ BIP BSQ BIL BIP BSQ ВП BIP **BIP-MEM** ВП **BIP-MEM** BIL **BIP-MEM** LANDSAT AVIRIS AIRS RUNCFG Area X Gates

(mm^2)

(kilo)

- Notes:
  - Memories smaller than 64 words are mapped into FF.
  - There's an overhead in memory usage due to port bit width alignment to match the memories view available for this project.

![](_page_49_Picture_7.jpeg)

65

# Conclusions

![](_page_50_Picture_1.jpeg)

- We have presented the hardware architecture and VHDL description IP cores, which perform lossless compression as specified by the the CCSDS-121 and CCSDS-123 standards.
- The cores might work independently as well as jointly, offering simple plug-and-play compatible interfaces.
- Technology independent. Configurable at compile-time and runtime.
- Resource usage and throughput depend on the selected compiletime configuration.
- Mapped to 7 different FPGA devices: Xilinx Virtex 5 & 5QR; Microsemi ProASIC3E, ProASIC3L, RTAX2000, RTAX4000 and RTG4
- Feasibility of implementation on Virtex5 FX130. Maximum throughput 153 Msamples/s.
- Low complexity: maximum 7% of LUTs Virtex5 and 13% RTG4.
- Demonstrator validates design and shows throughput of up to 1 Gbps.

![](_page_50_Picture_10.jpeg)

![](_page_51_Picture_0.jpeg)

![](_page_51_Picture_1.jpeg)

### EXPRO+ ESA AO/1-8032/14/NL/AK CCSDS Lossless Compression IP-CORE Space Applications

### Final Presentation Days 8<sup>th</sup> May 2017

08/05/2017

TRP-AO8032 Final Presentation Days