# DSP Benchmark Results of the GR740 Rad-Hard Quad-Core LEON4FT Topics: Status and results of DSP related ESA contracts, Space qualified DSP components #### Javier Jalle, Magnus Hjorth, Jan Andersson Cobham Gaisler, Kungsgatan 12, SE-411 91, Göteborg, Sweden Tel: +46 31 775 86 50 {javier.jalle,magnus.hjorth,jan.andersson}@gaisler.com ### Roland Weigand, Luca Fossati European Space Agency, Keplerlaan 1 – PO Box 299, 2220AG Noordwjik ZH, The Netherlands, Tel: +31 71 565 65 65 {roland.weiqand,luca.fossati}@esa.int #### **ABSTRACT** The GR740 microprocessor device is a SPARC V8(E) based multi-core architecture that provides a significant performance increase compared to earlier generations of European space processors. The device is the result the European Space Agency's initiative to develop a European Next Generation Microprocessor (NGMP). Engineering models have been manufactured in 2015 and tested during the first quarter of 2016. Space qualification of flight models is planned to start in the second half of 2016. GR740 is the highest performing European space-grade general purpose microprocessor and, due to the presence of four powerful floating-point units, it is suitable for executing DSP applications. This abstract provides an overview of the GR740 and a subset of the benchmarks used within the ESA activity's functional validation effort. #### **BACKGROUND** The LEON project was started by the European Space Agency in late 1997 to study and develop a high-performance processor to be used in European space projects. Following the development of the TSC695 (ERC32) and AT697 processor components in 0.5 and 0.18 $\mu m$ technology respectively, ESA initiated the Next Generation Microprocessor (NGMP) activity targeting a European Deep Sub-Micron (DSM) technology in order to meet increasing requirements on performance and to ensure the supply of European space processors. Cobham Gaisler was selected to develop the NGMP system that is centred around the new LEON4FT processor. Throughout 2014 and 2015, the architecture was designed and manufactured in the C65SPACE platform from STMicroelectronics [4]. This chip, now called GR740, constitutes the NGMP Engineering Model. Besides the chip development, the existing SPARC software development environment has been extended with support for the GR740. Figure 1: GR740 Block diagram #### **ARCHITECTURAL OVERVIEW** Figure 1 shows an overview of the GR740 architecture. The four LEON4FT processors are connected to a shared bus which connects to a 2 MiB EDAC protected Level-2 cache before reaching external EDAC protected SDRAM. Each LEON4FT processor has a dedicated pipelined IEEE-754 floating-point unit. While the GR740 implementation of LEON4FT lacks support for dedicated multiply-and-accumulate instructions this is mitigated by the presence of the large number of processor registers, L1 cache memory and high operating frequency. The main communication interfaces of the device include eight external SpaceWire ports connected to an on-chip SpaceWire router, two 10/100/1000 Mbit Ethernet ports, MIL-STD-1553B and 32-bit PCI. The design makes use of extensive clock gating for the communication interfaces and the processors, that can be put in a power-down mode to conserve power when some or all cores are unused. The four parallel CPU / FPU cores, each running on dedicated separate instruction and data L1 caches (Harvard architecture), at 250 MHz clock frequency, can theoretically provide up to 1 Gflop/s in single or double precision. Together with the multiple Spacewire and Ethernet interfaces, this makes the GR740 suitable for DSP applications, provided that the application implementation succeeds in making an efficient parallelisation and streaming of data across the shared on-chip buses. This can be demonstrated with the implementation of dedicated DSP benchmarks, as for example those suggested in [1]. The NGMP architecture has already been evaluated in an effort where the GAIA VPU application was adapted to take advantage of a multi-core system. The conclusion from this effort was that the GR740 is fast enough to run the GAIA VPU application [2]. # FUNCTIONAL VALIDATION AND DSP BENCHMARKS The functional validation of the GR740 device builds on existing tests used in the frame of the NGMP activities. The tests include both functional and performance benchmarks. **PARSEC 2.1 benchmarks:** PARSEC are a set of multithreaded shared-memory benchmarks. We run them with different number of cores. To show the benefit of multiple cores, we calculate the speedup as: $$S_{up} = \frac{T_1}{T_2}$$ where $T_1$ is the execution time with one core and $T_2$ the execution time with different number of cores. In an ideal parallel application with no overheads, the speedup obtained with 4 cores would be 4x. Figure 2 shows the speedup of a set of the PARSEC 2.1 small workloads under Linux. We observe an speedup up to almost 3.5x on the *swaptions* benchmark and 1.83x on average for the 4 cores. Figure 2: PARSEC benchmarks speedup **Barcelona Supercomputing Center Multicore OS** benchmarks: These benchmarks were designed to evaluate the multicore interference for different OS [5]. We use a subset of the benchmarks that continuously access the L2 cache with different patterns: *L2-128K* and *L2-256K* use 128K and 256K of L2 space, *L2-miss* is designed to miss on the L2 cache and *ST* performs store operations that hit on the L2 cache. These four benchmarks are highly sensitive to interference when running in multicore. We execute these benchmarks in single core without interference and with all other cpus running L2-miss to generate an extreme interference scenario. We calculate the slowdown (as the inverse of the speedup) which effectively measures the impact of the interference that the cores are generating. Figure 3 shows the slowdown for the above mentioned benchmarks. We observe that the slowdown reaches up to almost 3.1x for the ST benchmark, which is the most sensitive since in the absence of interference, store operations are very efficient due to the write-buffers. Figure 3: BSC Multicore OS benchmarks slowdown **EEMBC benchmarks:** We have successfully compiled and run EEMBC CoreMark, Autobench, FPMark and Multibench benchmark suites. In this paper, we present the results of the Coremark and Autobench suites which might be interesting for a DSP audience. In order to compare the GR740 with previous LEON processors, we run the Coremark in a single core on the UT699, GR712 and GR740. Figure 4 shows the CoreMarks [3] when running in a single core. We can see a significant increment on the GR740 with respect to the previous processors, mainly due to the frequency increment (250 MHz vs 50 MHz). This increment would become even bigger if we consider the four cores in comparison with the 2 core GR712RC or the singlecore UT699. Figure 4: Coremarks per core for different LEON processors Figure 5 shows the iterations/sec of the EEMBC Autobench suite under singlecore Linux OS, which allows to compute an AutoMark score of 111.97, comparable with the scores shown in [3]. Figure 5: EEMBC automotive benchmarks *CCSDS* 123 *Image Compression:* This software implements the lossless multispectral & hyperspectral compression according to the draft standard CCSDS 123.0-R-1. We have run 4 compressions under Linux using one and four cpus, showing an speedup factor of 3.43x. #### CONCLUSION The GR740 is a SPARC V8(E) based multi-core architecture that provides a significant performance increase compared to earlier generations of European space processors, with high-speed interfaces such as SpaceWire and Gigabit Ethernet on-chip. The platform has improved support for profiling and debugging, and software tools have been upgraded to this new architecture. Moreover, a rich set of software is immediately available due to backward compatibility with existing SPARC V8 software and LEON3 board support packages. The GR740 constitutes the engineering model of the ESA NGMP, which is part of the ESA roadmap for standard microprocessor components. It is developed under ESA contract, and it will be commercialised under fair and equal conditions to all users in the ESA member states. The GR740 is also fully developed with manpower located in Europe, and it only relies on European IP sources. It will therefore not be affected by US export regulations. The functional validation effort aims to validate functionality of the device and of the development board that will be made available to the space industry. The GR740 is the highest performing European space-grade processor to date and results of DSP benchmarks will be presented to allow industry to assess the GR740's suitability for DSP applications. News about the GR740 device can be found at the following link: http://www.gaisler.com/gr740 ## **REFERENCES** - [1] Next Generation Space Digital Signal Processor Software Benchmark , Issue 1.0, TEC-EDP/2008.18/RT, 01 December, 2008 - [2] RTEMS SMP Executive Summary, Issue 1, Revision 2, RTEMSSMP-ES-001, March 2015, <a href="http://microelectronics.esa.int/ngmp/RTEMS-SMP-ExecSummary-CGAislerASD-OAR.pdf">http://microelectronics.esa.int/ngmp/RTEMS-SMP-ExecSummary-CGAislerASD-OAR.pdf</a> - [3] EEMBC The Embedded Microprocessor Benchmark Consortium <a href="http://www.eembc.org/">http://www.eembc.org/</a> - [4] P. Roche, G. Gasiot, S. Uznanski, J-M. Daveau, J. Torras-Flaquer, S. Clerc, and R. Harboe-Sørensen, "A Commercial 65 nm CMOS Technology for Space Applications: Heavy Ion, Proton and Gamma Test Results and Modeling", IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 57, NO. 4, AUGUST 2010 - [5] Francisco J. Cazorla et. al. Multicore OS benchmarks. Technical Report Contract 4000102623, European Space Agency, 2012.