# BRAVE FPGA tools quality assessment & improvements QUEENS-FPGA Project 12th ESA Workshop on Avionics, Data, Control and Software Systems (ADCSS) ESTEC 16th October 2018 David Gonzalez-Arjona (dgarjona@gmv.com) George Lentaris (glentaris@microlab.ntua.gr) ### 1. FRAMEWORK OF EVALUATION - 1. (PROJECTS AND BENCHMARKING) - 2. EXPERIENCE INITIAL RESULTS 3. PRELIMINARY CONCLUSIONS # ADCSS HON HON OUE VALUAT # **NANOXMAP SW TOOLS** # **QUEENS-FPGA OBJECTIVES** # QUality assessment Evaluation of European NanoXplore SW for brave FPGA - BRAVE project provided very promising European SRAM-based FPGAs for Space - ☐ FPGA Development SW is really important to take full advantage of the technology - Quality systematic Assessment Plan based on measurable FPGA SW Tools Metrics identification - HW benchmarking identification based on consortium background knowledge (GMV, NTUA, NanoXplore) - Performance and Reliability Evaluation - FPGA and HDL-languages expertise on Space projects under ESA contracts - Migration or Porting to BRAVE of in-house developed IP-core for space project - ✓ Simulation - ✓ Formal proof - Static timing analysis - ✓ LUTs utilization # **QUEENS-FPGA METHODOLOGY** - Follow a systematic & unified approach throughout the tool assessment project - metrics & methodology (Task1) - benchmark selection (Task1) - synthesis (Task2) - 4. Place & Route (Task3) - 5. 3<sup>rd</sup> party tools (Task4) - 6. HW testing (Task5) - Different complexity level circuits - Fair selection of comparable devices - Technology - Resources - Hardened - Methodology serves a twofold purpose - explore/examine multiple tool issues - parameters, load, trade-offs, etc. - compare to 3<sup>rd</sup> tools, show quality - 2. avoid dead ends & erroneous reports - use feedback paths, simulations # **QUEENS-FPGA METRICS** ### **ABILITY TO:** - synthesize a RTL code - ability to route a design - Inferred VHDL coding and mapping ### **QUALITY OF REPORTS** - Organization - Synthesis estimations - Timing reports ### **DESIGN REUSE** - IP cores provided - generating new IP cores - Support of reuse standards ### ADVANCED DESIGN FLOWS - Floorplanning - Incremental design - Third-Party Integration (Simulation) ### SW TOOL PC CONSTRAINTS: - Memory resources - Execution time - CPU usage ### **USER EXPERIENCE** - Ease of use - Customization of the tools - Scripting capabilities ### HARDWARE INTERACTION - Programming time - verification of bitstream - hardware debugging ### MAIN DEV. PRODUCT METRICS - RTL Optimization - Area vs Timing vs Power - Constraints # **FPGA TRADE-OFF ANALYSIS** - ☐ Very good power consumption figures - ☐ Radiation tolerance and hardening is really good - ☐ FPGA available resources opens Europe technology possibilities - □ Digital signal processing, image processing, compression, interfaces - □ Increasing number of resources with each new FPGA to be released - Many relativly complex circuits fits on NG-MEDIUM - SECDED EDAC in embedded memory - ☐ Interfaces provides many high-speed options - □ SpW FPGA firmware configuration and HW SPW D/S encod/decod - ☐ There is no commercial restriction → Fully European development □ → NG-LARGE → NG-ULTRA ## **EVALUATION RESULTS BASES** DOES IT MAKE THE JOB? Can you use it for your mission? SW tools comparison Usability, Flexibility, Reporting, Inference, Libraries HW devices comparison Power, Performances, Area # **QUEENS-FPGA SYNTHESIS** - ☐ Built in same technology node 65 nm, SRAM-based - □ Similar in architecture/technology - 4 input LUTs - □ DSPs (18\*18 bits) - Registers - DFFs - ☐ Similar available resources - Embedded RAM blocks - Number of LUTs, DSPs, DFF - ☐ Space-grade devices - Rad-hard High-performance devices (not so many) - ☐ Each devices are supported by its vendor SW tool - □ NanoXmap 2.7.0 to NanoXmap 2.9.0 - ☐ Trade-off # **EXPECTATIONS FOR THE TOOLS** ### **Desirable Features** Manual FPGA placement/routing (floorplan, editor, etc.). Support multiple design strategies (e.g., optimization of performance, area, power). Include basic steps (e.g., create new project, include/create source files and contraints), tool options (e.g., synthesis and PAR options) and outcome reports (e.g. resources, execution time, etc.) in the FPGA SW tools GUI. Static timer analyser (Will be included) Detailed Static Timing Analysis for all the paths and nets (coverage percentage given by the user). Fine-grain management of resources (LUTs, DFFs, DSPs etc.) via floorplan and constraints files. The GUI allows you to create a new project or load existing ones when opening the tool. Explorer functions to search files/projects and proper messages if failing opening or adding them to existing project Automatic creation of dependency tree between files included in the project and hierarchy viewer. When synthesis or PAR processes fail to complete, the tool should give comprehensive messages that will help the user to identify the errors easily. Provide infos, warnings and errors tabs/windows in the FPGA SW tools GUI. Present critical warnings when the ucf constraints are not specified accordingly. Notification when a vhd file that is not included in the hierarchy tree. HDL files editor or third-party tool integration to edit the files Automatic show of VHDL parameters contained the project (from package files). Invoke simulator via FPGA SW tools GUI. ### Support design attributes in the vhd sources. Include vhd templates (memories, FSMs, DSPs, etc.) suitable for the optimization and inference capabilities of the NanoXmap synthesizer. Also, this increases the productivity. Schematic capability of the synthesized design. Option to perform multiple synthesis and PAR runs (e.g. different tool and design configuration) and store the corresponding results. Separation of the source vhd design files from other type of vhd files such as testbenches and configuration packages. Build separate organized directories for synthesis, PAR, IPs, etc, in the same workspace. Capability of initializing memories (e.g. RAMBs) from a txt file. In the synthesis report the utilized memory should be presented both in RAMBs and bits. # **H** # **BENCHMARK CIRCUITS 1/2** ### **SIMPLE CIRCUITS:** - Light in terms of area utilization - Light in effort requirement for the SW tolos - Simplicity ### **MID-COMPLEX CIRCUITS:** - Heavier in terms of area utilization - More exigent for the SW tools - Less complex than DC3 ### **COMPLEX CIRCUITS:** - Most complex of the designs - Worth to elaborate a specific section for the selection - Stress the FPGA synthesis challenges | Bench | | Configurations | | | | | |---------|-------------------------------------------------------|-----------------------------------------|----------------------------------------|----------------------------------------|--|--| | mark | Description | C1 | C2 | C3 | | | | СОМВ | Multiplexor | Case sentence | If-else sentence | Concurrent sentence | | | | COUNTER | 4 bits Counter | Asynchronous reset | Synchronous reset with priority enable | Synchronous priority reset with enable | | | | BCD | 3 digits BCD<br>Counter | Standard VHDL<br>design method | 2 process VHDL design method | - | | | | REGBANK | Register Bank<br>with 16 registers<br>of 32bits | Case sentence | Indexed array | - | | | | FSM | Finite State<br>Machine with 8<br>states | hine with 8 OneHot encoding Binary enco | | - | | | | PWM | PWM<br>(configurable<br>frequency and<br>duty cycle ) | Without DSP | With DSP | - | | | | VGA | VGA controller | Default | - | - | | | | BLINK | Led control | Default | - | - | | | | FPMULT | 32-bits Floating<br>Point Multiplier | Without DSP | With DSP | - | | | | FIFO | 7*1024*18bits<br>FIFO | With BRAMs | Without BRAMs | | | | # **BENCHMARK CIRCUITS 2/3** ### **SIMPLE CIRCUITS:** - Light in terms of area utilization - Light in effort requirement for the SW tolos - Simplicity ### **MID-COMPLEX CIRCUITS:** - Heavier in terms of area utilization - More exigent for the SW tools - Less complex than DC3 ### **COMPLEX CIRCUITS:** - Most complex of the designs - Worth to elaborate a specific section for the selection - Stress the FPGA synthesis challenges | Benchmark | Description | | | |------------------|----------------------------------------|--|--| | VGA | VGA video controller | | | | SDRAM controller | SRAM/SDRAM memory management | | | | A-TME | CCSDS Telemetry Encoder | | | | CAN | CAN_OC GR-CAN-2.0 CAN-OPEN | | | | LEON3 | Soft-processor (minimum configuration) | | | # **BENCHMARK CIRCUITS 3/3** ### **SIMPLE CIRCUITS:** - Light in terms of area utilization - Light in effort requirement for the SW tolos - Simplicity ### **MID-COMPLEX CIRCUITS:** - Heavier in terms of area utilization - More demanding circuits for the SW tools - Still FPGA margins are far ### **COMPLEX CIRCUITS:** - Most complex of the designs - Worth to elaborate a specific section for the selection - Stress the FPGA synthesis challenges | Benchmark | Description | | | |------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|--|--| | FIR | Signal filtering<br>NG-MEDIUM:<br>~30% logic | | | | Disparity | image depth extraction NG-MEDIUM: ~50% memory ~10% logic image corner detection NG-MEDIUM: ~50% memory ~50% logic | | | | Harris | | | | | Relative Navigation<br>Porting existing design | Image feature tracking<br>Re-design to fit in NG-MEDIUM<br>up to the limits of resources | | | # **HIGH-PERFORMANCE AVIONICS** >HIPNOS > Computational demands are constantly increasing in space applications (on-board processing) - Vision Based Navigation - for rovers, landers, spacecraft proximity operations - autonomy, reliability, more and more data on-board... Co-processor for high-frequency high-performance applications Offload space-grade processors \* 1-2 orders of magnitude slower vs COTS counterparts IG-MEDIUM new avionics? HW accelerators? ### SPACE-GRADE EUROPEAN TECHONOLOGY # ACCELERATION OF IMAGE PROCESSING IN NG-MEDIUM # RESULTS - Rosetta images - Replication of fly-by Navigation - Not in real-time - Not in closed loop - HW/SW split # PRELIMINARY CONCLUSIONS - Architecture Matters: Learn how to optimize design for the new board architecture - Porting → wrappers over specific HW resources (DSPs, BRAMs) - Very responsive Vendor support team. - Improvements version by version - Corner cases or different improvement targets make some parameter worse - NG-MEDIUM is in the market $\rightarrow$ NanoXplore is an actual competitor $\rightarrow$ Portfolio increasing $\rightarrow$ Good for Industry - Great improvement of generated reports in 2.8.3 version - Efficient use of DFFs in low and medium complexity designs - Reduced use of LUTs and DFFs when RAMB or DSP are used in low and medium complexity designs. - Very low dynamic power consumption reported - Competitive implementation times - Depends on reporting Increased use of LUTs compared to 3<sup>rd</sup> party tools ~[+10% to +60%] → (LUTs for HDL, LUTs pass-through) [architecture matters]. - Lower maximum frequencies achieved ~[+30% to +50%] - Less flexibility to tailor elements in designs # BENCHMARK ON COMPLEX ALGORITHMS # **HIGH-PERFORMANCE DSP BENCHMARKS** - ☐ Set of functions developed in-house for past ESA activities - mostly image processing for navigation algorithms - □ initially on Xilinx FPGAs, in parametric VHDL - 1. depth extraction - 2. corner detection - 3. filtering - 4. feature matching - 5. blob extraction - 6. edge detection - 7. feature description etc. (~15 total) 16/10/2018 Página 21 # **BRAVE ASSESSMENT DC3 METHODOLOGY** - □ DC3 testing based on custom **methodology**, of multiple steps - ☐ **improved** during project according to test experience # **BRAVE ASSESSMENT DC3 METHODOLOGY** - □ DC3 testing based on custom **methodology**, of multiple steps - ☐ **improved** during project according to test experience - ☐ grouped in 4 successive & interdependent phases: - Benchmark Selection - Synthesis - Place & Route - HW execution Proceed to P&F QUEENS-FPGA in ADCSS 2018 - ESTEC 16/10/2018 Página 23 INFORMACIÓN NO CLASIFICADA Código Doc. # **BRAVE ASSESSMENT DC3 METHODOLOGY** - □ goal: test all aspects of *NanoXmap* - tool configurations - □ RTL mapping efficiency - quality of results - □ quality of reports & manuals # **BRAVE ASSESSMENT DC3 METHODOLOGY** - □ goal: test all aspects of *NanoXmap* - tool configurations - RTL mapping efficiency - quality of results - quality of reports & manuals - □ <u>key points</u> in our diagrams: - 1) brute-force examine all parameters - similarly for the mature 3<sup>rd</sup> party tools - create database with 1000's of results - **2) compare** *NanoXmap* results, both vs mature tools and by-hand estimations - 3) follow outliers in reports, recursively - use smaller benchmarks to pinpoint problems - 4) stages: low/HDL-level and high/UI-level tuning - **5) simulations** for verification (netlist vs behavioral) # **RESULTS: BENCHMARK ANALYSIS** - □ studied complexity & suitability - □ data from 1000+ syntheses - by changing parameters of algorithms & tools (7K results) - created comparison tables and plots (per tool & benchmark) # **RESULTS: BENCHMARK ANALYSIS** - □ studied complexity & suitability - □ data from 1000+ syntheses - by changing parameters of algorithms & tools (7K results) - created comparison tables and plots (per tool & benchmark) - □ best= **Disparity, Harris, FIR** (selection by 6 criteria) - □ cover all FPGA resource types - □ NG-MEDIUM utilization 11-98% | | feasibility | scalability | diversity | throughput | debugging | demo | TOTAL | |---------------|-------------|-------------|-----------|------------|-----------|------|-------| | 1. Disparity | 3 | 3 | 2 | 3 | 3 | 3 | 17 | | 2. Spacesweep | 3 | 2 | 3 | 3 | 1 | 3 | 15 | | 3. Harris | 3 | 3 | 3 | 3 | 2 | 3 | 17 | | 4. SURFdet | 2 | 1 | 2 | 3 | 2 | 3 | 13 | | 5. SIFTdesc | 3 | 1 | 1 | 2 | 1 | 1 | 9 | | 6. SURFdesc | 1 | 1 | 3 | 2 | 1 | 1 | 9 | | 7. SIFTmatch | 2 | 1 | 3 | 2 | 3 | 1 | 12 | | 8. BRIEFmatch | 2 | 1 | 3 | 2 | 3 | 1 | 12 | | 9. FIR | 3 | 3 | 1 | 3 | 3 | 3 | 16 | QUEENS-FPGA in ADCSS 2018 - ESTEC 16/10/2018 Página 27 INFORMACIÓN NO CLASIFICADA Código Doc. # **RESULTS: SYNTHESIS** - □ tested versions 2.7.1−2.8.6, all parameters: MappingEffort, MaxRegister Count, TimingDriven, MergeRegisterToPad, DefaultROMMapping, DefaultRAMMapping, DefaultFSM Encoding, AdderToDSPMapThd, MultiplierToDSPMapThreshold, LessThanToDSPMapThreshold, ... - > few were not functioning, **now corrected** in latest versions - > many provide user the ability to drive synthesis efficiently # **RESULTS: SYNTHESIS** - □ tested versions 2.7.1−2.8.6, all parameters: MappingEffort, MaxRegister Count, TimingDriven, MergeRegisterToPad, DefaultROMMapping, DefaultRAMMapping, DefaultFSM Encoding, AdderToDSPMapThd, MultiplierToDSPMapThreshold, LessThanToDSPMapThreshold, ... - > few were not functioning, **now corrected** in latest versions - > many provide user the ability to drive synthesis efficiently - □ competitive resources vs 3<sup>rd</sup> tools, still room for improvement - □ logic: -17% LUTs (+79% incl. pass-through), +27% DSPs - □ memory: -2% RAMblocks (+28% in RAMbits), +19% DFFs - □ but with spikes/outliers: +129% DFF, +275% DSP, +465% LUT - □ now up to 30% self-improvement in v2.9.0 ## **RESULTS: SYNTHESIS** - □ tested versions 2.7.1−2.8.6, all parameters: MappingEffort, MaxRegister Count, TimingDriven, MergeRegisterToPad, DefaultROMMapping, DefaultRAMMapping, DefaultFSM Encoding, AdderToDSPMapThd, MultiplierToDSPMapThreshold, LessThanToDSPMapThreshold, ... - > few were not functioning, **now corrected** in latest versions - many provide user the ability to drive synthesis efficiently - □ <u>competitive resources vs 3<sup>rd</sup> tools</u>, still room for improvement - □ logic: -17% LUTs (+79% incl. pass-through), +27% DSPs - □ memory: -2% RAMblocks (+28% in RAMbits), +19% DFFs - □ but with spikes/outliers: +129% DFF, +275% DSP, +465% LUT - □ now up to 30% self-improvement in v2.9.0 - □ long tests for netlist correctness - ☐ few problems in simulation - synthesizer bugs now corrected # **RESULTS: PLACE & ROUTE** NG-MEDIUM, NanoXmap v.2.8.6 | П | diverse | utilization | ratios | |---|---------|-------------|--------| | _ | uiveise | utilization | Tatios | □ almost same resources from synthesis to P&R | Benchmark | LUTs | DFFs | RAMB | DPSs | CARRY | |--------------|-------|-------|-------|-------|-------| | FIR C2 | 33,8% | 33,7% | 0 | 57,1% | 15,6% | | Disparity C1 | 12,5% | 10,2% | 51,8% | 40,2% | 19,0% | | Harris C3' | 45,3% | 33,6% | 76,8% | 65,2% | 73,4% | # **RESULTS: PLACE & ROUTE** NG-MEDIUM, NanoXmap v.2.8.6 | $\overline{}$ | 1. | | | |---------------|---------|-------------|--------| | | divarca | utilization | ratios | | _ | uiveise | uunzauon | Tauos | □ almost same resources from synthesis to P&R | Benchmark | LUTs | DFFs | RAMB | DPSs | CARRY | |--------------|-------|-------|-------|-------|-------| | FIR C2 | 33,8% | 33,7% | 0 | 57,1% | 15,6% | | Disparity C1 | 12,5% | 10,2% | 51,8% | 40,2% | 19,0% | | Harris C3' | 45,3% | 33,6% | 76,8% | 65,2% | 73,4% | - $\Box$ 'timedriven' = 10-20% fclk boost with same resources on FPGA (but with ~60% increase of P&R time, v.2.8.5) - $\square$ compared to 3<sup>rd</sup> tools: less than half max fclk (~40%), but in good range (50–110 MHz) # **RESULTS: PLACE & ROUTE** NG-MEDIUM, NanoXmap v.2.8.6 - ☐ diverse utilization ratios - □ almost same resources from synthesis to P&R | Benchmark | LUTs | DFFs | RAMB | DPSs | CARRY | |--------------|-------|-------|-------|-------|-------| | FIR C2 | 33,8% | 33,7% | 0 | 57,1% | 15,6% | | Disparity C1 | 12,5% | 10,2% | 51,8% | 40,2% | 19,0% | | Harris C3' | 45,3% | 33,6% | 76,8% | 65,2% | 73,4% | - $\Box$ 'timedriven' = 10-20% fclk boost with same resources on FPGA (but with ~60% increase of P&R time, v.2.8.5) - $\square$ compared to 3<sup>rd</sup> tools: less than half max *fclk* (~40%), but in good range (50–110 MHz) - □ correct post-PAR netlists (simulation) - □ floorplan view: no control, but helps understanding and guiding synthesis NG-MEDIUM, floorplan view of "Harris C7" # **RESULTS: HW EXECUTION** - ☐ integrated with 3 Mbps UART (for HW/SW co-processing) - all 3 benchmarks operate correctly on NG-MEDIUM - FIR @ 110 MSPS - corner detection @ 48 FPS (image 512x512) - disparity @ 2x accuracy + 10x speed (vs Mars rovers now) # **RESULTS: HW EXECUTION** - ☐ integrated with 3 Mbps UART (for HW/SW co-processing) - all 3 benchmarks operate correctly on NG-MEDIUM - FIR @ 110 MSPS - corner detection @ 48 FPS (image 512x512) - disparity @ 2x accuracy + 10x speed (vs Mars rovers now) ### □ related **tests successful** - chip primitives: PLLs, specific multi-clock domain crossing - □ <u>board components</u>: EEPROM, LEDs, headers,... - □ <u>NxBase2 SW</u>: bitstream verify, status,... - chip reconfiguration: - correct for repeated tests, - ~faster vs competitors, - time proportional to size QUEENS-FPGA in ADCSS 2018 - ESTEC 16/10/2018 Página 35 INFORMACIÓN NO CLASIFICADA Código Doc. # **CONCLUSIONS/SUMMARY FOR DC3 TESTS** ### ☐ entire toolchain verified now ✓ considerable progress seen from version to version □ **lightweight tool** (faster vs others), user friendly, good reports # **CONCLUSIONS/SUMMARY FOR DC3 TESTS** ### ☐ entire toolchain verified now ✓ considerable progress seen from version to version □ **lightweight tool** (faster vs others), user friendly, good reports - □ still room for improvement - □ <u>in SW</u>: DFF absorption, RAMB mapping,... (now testing v2.9.x) - □ <u>in HW</u>: speed, size (NG-LARGE will provide 4x resources) - □ successfully executes intensive DC3 DSP algorithms - sufficient performance for space applications - competitive in terms of reconfiguration Página 37 **OUEENS-FPGA** ### CONCLUSION - Space Industry real results and implementations to alleviate risks assuming new technologies. - QUEENS-FPGA Project is a detailed and methodological assessment with experienced engineers spending a lot of time and effort to get Deep analysis and fair comparison results. - Many findings reported and corrected during the Project - NX SW tools are much more mature than at the beggining of the Project. ESA FPGA flow can now be fully followed. - GMV would be able to initiate new projects for Space programs using NX FPGA and SW tools. This means that the results of this evaluation are good enough to for Space Industry to claim NX tools are at a maturity level so to incorporate NG-MEDIUM in their projects. - Established methodology provides good quality results on the evaluation assessment - NanoXmap tools are being improved. Still some capabilities to work in +more tools and IPs to come - FPGA board NG-MEDIUM fits many space requirements needs and applications expectations ## **DEMO VIDEOS** GMV BACKGROUND EXPERIENCE # **Questions? Suggestions?** Comet 67P/Churyumov-Gerasimenko