ABSTRACT
The evolution of the Earth Observation mission is
driven by the development of new processing
paradigms to facilitate data downlink, handling and
storage. Next generation planetary observation
satellites will generate a great amount of data at a very
high data rate, for both radar based and optical core
applications.
Real-time onboard processing can be the solution to
reduce data downlink and management on ground.
Not only commonly used image compression
techniques (like e.g. JPEG2000) and signal processing
can be performed directly on board, but also
compression techniques based on more detailed
analysis of image data (like e.g. frequency/spectral
analysis).
The MacSpace RC64 is a prototype DSP/ASIC for
novel onboard image processing, which is being
designed, developed and benchmarked in the
framework of an EU FP7 project and targets these new
demands for making a significant step towards
exceeding current roadmaps of leading space agencies
for future payload processors. The DSP featuring the
CEVA X-1643 DSP IP core will deliver performance
of 75 GMACs (16bit), 150 GOPS and 38 single
precision GFLOPS while dissipating less than 10
Watts.
INTRODUCTION
Nowadays, leading space agencies plan for high
resolution and wide swath radar imaging systems
aboard satellites such as the one to be employed in
future Sentinel-1 (HRWS) or potential Venus orbiter
missions. Part of the processing could be shifted from
the ground station to the satellite itself, requiring
powerful real-time on-board processing [1].
Typical applications include, SAR imaging and data
compression. A large set of these applications comprise
of computationally intensive kernels.
These ambitions – far beyond well-known benchmarks,
comprising of mostly basic signal processing
algorithms like Fast Fourier Transform (FFT) and
Finite Impulse Response (FIR) filtering – depend on
the availability of flexible and scalable hardware and
software solutions, since applications most likely will
change and develop over time and therefore space
systems will need to adapt within limited time frames.
Unlike currently employed applications such as e.g.
FFT processing and BAQ compression on SAR
satellites that usually do not change during the life-time
of a satellite and therefore are mostly realized in
hardware (e.g. FPGA accelerators). More modern
applications - due to longer development time and
relatively high development costs - can’t be
implemented on special purpose hardware accelerators
economically. We have detected the need for a
platform that allows enough flexibility for space
application developers and mission planners in order to
determine feasibility of new ground breaking missions
and to determine their parameters.
The aim of the MacSpace project is to drive on-board
processing of complex applications such as SAR
imaging forward, eliminating the need for continuous
transfer of huge data streams to ground stations, saving
significant energy, time and bandwidth that are
required for data transfers and especially for planetary
observation. Besides enabling latency critical
workloads, energy for data transmission can be saved
and spent instead for onboard high-performance
computing. One key challenge of MacSpace therefore
is matching potential application requirements.
SAR IMAGE PROCESSING
Modern Synthetic Aperture Radar (SAR) systems are
continuously developing into the direction of higher
spatial resolution and new modes of operation. This
requires the use of high bandwidths, combined with
wide azimuthal integration intervals.
For focusing such data, a high quality SAR processing
method is necessary, which is able to deal with more
general sensor parameters. Wavenumber domain
(Omega-K) processing is commonly accepted to be an
ideal solution of the SAR focusing problem. It is
mostly applicable on spaceborne SAR data where a
straight sensor trajectory is given.
Therefore, within the MacSpace project the TU
Braunschweig in close connection with the DLR is
conducting experimental benchmarks on a
representative SAR application excluding
preprocessing steps.
The application consists of:
i) Range FFT
ii) Range compression
iii) Modified Stolt Mapping
iv) Range IFFT
v) Azimuth FFT
vi) Azimuth Compression
vii) Azimuth IFFT
Computation-wise one single RC64 chip could be
capable of processing data of 8192x8192 complex
values (single precision floating point, i.e. in total
512MB) in under 2 seconds @ 300MHz and 100%
compute utilization (based on a computation count:
60G Floating Point Operations @ 38 GFLOPS). Since
the onboard data bandwidth (per core: L1 data - peak
128bit read/write per cycle per core from/to registers,
L1 from/to shared memory ('L2') 128bit @~50%
utilization read and 32bit write) potentially can sustain
the demand by computations, reaching the best-case
performance will be a matter of latency hiding. In the
worst-case scenario, we still expect the application to
finish processing the above described data in under 1
minute.
MACSPACE DEMONSTRATOR
The development of a MacSpace demonstrator is part
of the project to validate the usability and functionality
of the system. The processor architecture is
implemented in a high-performance FPGA (Xilinx
Virtex 7) representing the MacSpace RC64 prototype,
which executes the image processing. A personal
computer performs the management and the payload
data handling. The GSEOS V software package is used
to send preprocessed radar data, control and monitor
the prototype as well as to analyse the results and
qualify the performance.
Its high computing performance of 150 GOPS and 38
GFlops per RC64 chip, which could scale to an
interconnected system that meets any defined
performance level, can maintain high processing
resources utilization using innovative parallel
programming technics. The main approach is to
parallelize compute kernels on a base of sufficiently
small-split independent tasks that each work on local
data, while using shared memory.
A hardware (task) scheduler dynamically allocates,
schedules, and synchronizes tasks among the parallel
processing cores according to the program flow.
Hence, it reduces the need for an operating system
(OS) and eliminates large software
management/execution overhead. No OS is deployed to
the cores.
RELATED WORK AND COMPARISON
Most existing processors for space applications, such
as Atmel AT697 [5], Aeroflex UT699 [6], Aeroflex
Gaisler GR712RC [7] and BAE Systems RAD750 [8],
provide performance levels below 1,000 MIPS, and are
thus unsuitable for executing high-performance “next
generation digital signal processing” (NGDSP) tasks in
space missions [1]. While NGDSP requirements are
listed at 1,000 MIPS/MFLOPS, a more practical goal is
10,000 MIPS. Even the fastest, currently available
space processor, SpaceMicro Proton200K [9], achieves
only about 4,000 MIPS/900MFLOPS. Performance of
some space processors versus year of introduction is
plotted in figure 2.
![enter image description here][1]
Figure 2: Performance Comparison of the RC64 based
on MacSpace RC64 Prototype with other space
processors
Recently, the US government has adopted Tilera’s Tile
processor for use in space, in the framework of the
OPERA program and the Maestro ASIC [10].
Integrating 49 triple issue cores operating at 310 MHz,
it is expected to deliver peak performance of 45,000
MIPS. Software development experience for the
Maestro chip has encountered difficulties in
parallelizing applications to the mere 49 cores of the
Maestro. Some of the developments have
underestimated the inter-core communication latencies
involved in the tiled architecture of Maestro. Due to
such difficulties, programmers are forced to cram
multiple different applications into the many-core,
resulting in additional difficulties regarding protection
of each application from the other ones.
REFERENCES
[1] ESA, Next Generation Space Digital Signal Processor
(NGDSP),
http://www.esa.int/TEC/OBDP/SEMR88NO7EG_0.html, July
2012
[2] Ginosar, Aviely et al., RC64: A Many-Core High-Performance
Digital Signal Processor for Space Applications, DASIA 2012
[3] Gao, B.-C., A. F. H. Goetz and W. J. Wiscombe, Cirrus Cloud
detection from airborne imaging spectrometer data using the
1.38 μm water vapor band, GRL,20,301-304, 1993
[4] Hyperspectral Image Processing for Automatic Target
Detection Applications Dimitris Manolakis, David Marden, and
Gary A. Shaw, VOLUME 14, NUMBER 1, LINCOLN
LABORATORY JOURNAL, 2003
[5] Atmel Corp., Rad-Hard 32 bit SPARC V8 Processor AT697E
(datasheet), http://www.atmel.com/Images/doc4226.pdf
[6] Aeroflex Gaisler, UT699 32-bit Fault-Tolerant LEON3FT
SPARC V8 Processor,
http://www.gaisler.com/index.php/products/components/ut699
[7] Aeroflex Gaisler, GR712RC Dual-Core LEON3FT SPARC V8
Processor,
http://www.gaisler.com/index.php/products/components/gr712r
c
[8] BAE Systems, RAD750®radiation-hardened PowerPC
microprocessor (datasheet),
http://www.baesystems.com/download/BAES_052281/Space-
Products--RAD750-component
[9] Space Micro, Proton200k DSR-based SBC,
http://www.spacemicro.com/assets/proton-200k-dspv22.pdf
[10] M. Malone, OPERA RHBD Multi-Core, MAPLD, 2009
[1]: http://amicsa.esa.int//2016/images/contrib/Macspace_RC64.png