12–16 Jun 2016
Gothenburg, Sweden
Europe/Amsterdam timezone

High-performance DSP for onboard image processing

16 Jun 2016, 10:05
30m
Gothenburg, Sweden

Gothenburg, Sweden

DSP Day: COTS DSP chips and boards Session 5: DSP Software and Applications

Speaker

Mr Jamin Naghmouchi (TU Braunschweig)

Description

ABSTRACT The evolution of the Earth Observation mission is driven by the development of new processing paradigms to facilitate data downlink, handling and storage. Next generation planetary observation satellites will generate a great amount of data at a very high data rate, for both radar based and optical core applications. Real-time onboard processing can be the solution to reduce data downlink and management on ground. Not only commonly used image compression techniques (like e.g. JPEG2000) and signal processing can be performed directly on board, but also compression techniques based on more detailed analysis of image data (like e.g. frequency/spectral analysis). The MacSpace RC64 is a prototype DSP/ASIC for novel onboard image processing, which is being designed, developed and benchmarked in the framework of an EU FP7 project and targets these new demands for making a significant step towards exceeding current roadmaps of leading space agencies for future payload processors. The DSP featuring the CEVA X-1643 DSP IP core will deliver performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. INTRODUCTION Nowadays, leading space agencies plan for high resolution and wide swath radar imaging systems aboard satellites such as the one to be employed in future Sentinel-1 (HRWS) or potential Venus orbiter missions. Part of the processing could be shifted from the ground station to the satellite itself, requiring powerful real-time on-board processing [1]. Typical applications include, SAR imaging and data compression. A large set of these applications comprise of computationally intensive kernels. These ambitions – far beyond well-known benchmarks, comprising of mostly basic signal processing algorithms like Fast Fourier Transform (FFT) and Finite Impulse Response (FIR) filtering – depend on the availability of flexible and scalable hardware and software solutions, since applications most likely will change and develop over time and therefore space systems will need to adapt within limited time frames. Unlike currently employed applications such as e.g. FFT processing and BAQ compression on SAR satellites that usually do not change during the life-time of a satellite and therefore are mostly realized in hardware (e.g. FPGA accelerators). More modern applications - due to longer development time and relatively high development costs - can’t be implemented on special purpose hardware accelerators economically. We have detected the need for a platform that allows enough flexibility for space application developers and mission planners in order to determine feasibility of new ground breaking missions and to determine their parameters. The aim of the MacSpace project is to drive on-board processing of complex applications such as SAR imaging forward, eliminating the need for continuous transfer of huge data streams to ground stations, saving significant energy, time and bandwidth that are required for data transfers and especially for planetary observation. Besides enabling latency critical workloads, energy for data transmission can be saved and spent instead for onboard high-performance computing. One key challenge of MacSpace therefore is matching potential application requirements. SAR IMAGE PROCESSING Modern Synthetic Aperture Radar (SAR) systems are continuously developing into the direction of higher spatial resolution and new modes of operation. This requires the use of high bandwidths, combined with wide azimuthal integration intervals. For focusing such data, a high quality SAR processing method is necessary, which is able to deal with more general sensor parameters. Wavenumber domain (Omega-K) processing is commonly accepted to be an ideal solution of the SAR focusing problem. It is mostly applicable on spaceborne SAR data where a straight sensor trajectory is given. Therefore, within the MacSpace project the TU Braunschweig in close connection with the DLR is conducting experimental benchmarks on a representative SAR application excluding preprocessing steps. The application consists of: i) Range FFT ii) Range compression iii) Modified Stolt Mapping iv) Range IFFT v) Azimuth FFT vi) Azimuth Compression vii) Azimuth IFFT Computation-wise one single RC64 chip could be capable of processing data of 8192x8192 complex values (single precision floating point, i.e. in total 512MB) in under 2 seconds @ 300MHz and 100% compute utilization (based on a computation count: 60G Floating Point Operations @ 38 GFLOPS). Since the onboard data bandwidth (per core: L1 data - peak 128bit read/write per cycle per core from/to registers, L1 from/to shared memory ('L2') 128bit @~50% utilization read and 32bit write) potentially can sustain the demand by computations, reaching the best-case performance will be a matter of latency hiding. In the worst-case scenario, we still expect the application to finish processing the above described data in under 1 minute. MACSPACE DEMONSTRATOR The development of a MacSpace demonstrator is part of the project to validate the usability and functionality of the system. The processor architecture is implemented in a high-performance FPGA (Xilinx Virtex 7) representing the MacSpace RC64 prototype, which executes the image processing. A personal computer performs the management and the payload data handling. The GSEOS V software package is used to send preprocessed radar data, control and monitor the prototype as well as to analyse the results and qualify the performance. Its high computing performance of 150 GOPS and 38 GFlops per RC64 chip, which could scale to an interconnected system that meets any defined performance level, can maintain high processing resources utilization using innovative parallel programming technics. The main approach is to parallelize compute kernels on a base of sufficiently small-split independent tasks that each work on local data, while using shared memory. A hardware (task) scheduler dynamically allocates, schedules, and synchronizes tasks among the parallel processing cores according to the program flow. Hence, it reduces the need for an operating system (OS) and eliminates large software management/execution overhead. No OS is deployed to the cores. RELATED WORK AND COMPARISON Most existing processors for space applications, such as Atmel AT697 [5], Aeroflex UT699 [6], Aeroflex Gaisler GR712RC [7] and BAE Systems RAD750 [8], provide performance levels below 1,000 MIPS, and are thus unsuitable for executing high-performance “next generation digital signal processing” (NGDSP) tasks in space missions [1]. While NGDSP requirements are listed at 1,000 MIPS/MFLOPS, a more practical goal is 10,000 MIPS. Even the fastest, currently available space processor, SpaceMicro Proton200K [9], achieves only about 4,000 MIPS/900MFLOPS. Performance of some space processors versus year of introduction is plotted in figure 2. ![enter image description here][1] Figure 2: Performance Comparison of the RC64 based on MacSpace RC64 Prototype with other space processors Recently, the US government has adopted Tilera’s Tile processor for use in space, in the framework of the OPERA program and the Maestro ASIC [10]. Integrating 49 triple issue cores operating at 310 MHz, it is expected to deliver peak performance of 45,000 MIPS. Software development experience for the Maestro chip has encountered difficulties in parallelizing applications to the mere 49 cores of the Maestro. Some of the developments have underestimated the inter-core communication latencies involved in the tiled architecture of Maestro. Due to such difficulties, programmers are forced to cram multiple different applications into the many-core, resulting in additional difficulties regarding protection of each application from the other ones. REFERENCES [1] ESA, Next Generation Space Digital Signal Processor (NGDSP), http://www.esa.int/TEC/OBDP/SEMR88NO7EG_0.html, July 2012 [2] Ginosar, Aviely et al., RC64: A Many-Core High-Performance Digital Signal Processor for Space Applications, DASIA 2012 [3] Gao, B.-C., A. F. H. Goetz and W. J. Wiscombe, Cirrus Cloud detection from airborne imaging spectrometer data using the 1.38 μm water vapor band, GRL,20,301-304, 1993 [4] Hyperspectral Image Processing for Automatic Target Detection Applications Dimitris Manolakis, David Marden, and Gary A. Shaw, VOLUME 14, NUMBER 1, LINCOLN LABORATORY JOURNAL, 2003 [5] Atmel Corp., Rad-Hard 32 bit SPARC V8 Processor AT697E (datasheet), http://www.atmel.com/Images/doc4226.pdf [6] Aeroflex Gaisler, UT699 32-bit Fault-Tolerant LEON3FT SPARC V8 Processor, http://www.gaisler.com/index.php/products/components/ut699 [7] Aeroflex Gaisler, GR712RC Dual-Core LEON3FT SPARC V8 Processor, http://www.gaisler.com/index.php/products/components/gr712r c [8] BAE Systems, RAD750®radiation-hardened PowerPC microprocessor (datasheet), http://www.baesystems.com/download/BAES_052281/Space- Products--RAD750-component [9] Space Micro, Proton200k DSR-based SBC, http://www.spacemicro.com/assets/proton-200k-dspv22.pdf [10] M. Malone, OPERA RHBD Multi-Core, MAPLD, 2009 [1]: http://amicsa.esa.int//2016/images/contrib/Macspace_RC64.png

Summary

Most existing processors for space applications, such as Atmel AT697 [8], Aeroflex UT699 [9], Aeroflex Gaisler GR712RC [10] and BAE Systems RAD750 [11], provide performance levels below 1,000 MIPS, and are thus unsuitable for executing high-performance “next generation digital signal processing” (NGDSP) tasks in space missions [1]. While NGDSP requirements are listed at 1,000 MIPS/MFLOPS, a more practical goal is 10,000 MIPS. Even the fastest, currently available space processor, SpaceMicro Proton200K [12], achieves only about 4,000 MIPS/900MFLOPS. Performance of some space processors versus year of introduction is plotted in figure 2.

Figure 2: Performance Comparison of the RC64 based on QI2S Prototype with other space processors
Recently, the US government has adopted Tilera’s Tile processor for use in space, in the framework of the OPERA program and the Maestro ASIC [15]. Integrating 49 triple issue cores operating at 310 MHz, it is expected to deliver peak performance of 45,000 MIPS. Software development experience for the Maestro chip has encountered difficulties in parallelizing applications to the mere 49 cores of the Maestro. Some of the developments have underestimated the inter-core communication latencies involved in the tiled architecture of Maestro. Due to such difficulties, programmers are forced to cram multiple different applications into the many-core, resulting in additional difficulties regarding protection of each application from the other ones.

Primary author

Mr Jamin Naghmouchi (TU Braunschweig)

Co-authors

Mr Andreas Reigber (DLR) Hagay Gellis (CEVA Inc.) Prof. Mladen Berekovic (TU Braunschweig) Mr Ole Bischoff (DSI) Mr Peleg Avieli (Ramon Chips Ltd.) Prof. Ran Ginosar (Ramon Chips Ltd.) Mr Rolf Scheiber (DLR) Mr Sören Michalik (TU Braunschweig)

Presentation materials