Speaker
Description
Hardware acceleration for an edge-AI application utilizing a convolutional neural network (CNN) typically involves distributing intensive computational tasks, such as matrix convolutions or multiplications, across multiple cores running in parallel. This can be achieved using a static GPU-like (Graphical Processing Unit) architecture or a configurable array of cores like the one available in the AMD Versal AICore device [1, 2], which is a monolithic chip that embeds a processing system, a programmable logic and an array of 400 AIEngines (AIEs) that are specialized computation tiles well-suited for artificial intelligence (AI) oriented applications. The Versal provides an interesting potential for optimized AI-acceleration [3, 4] with the flexibility of a configurable device for radiation hardening of space applications.
In this work, we present the testbench that we have developed in the context of Single Events Effects (SEE) assessment of the Versal AIEs under laser testing [5]. It is based on ResNet50 [6], a CNN designed to efficiently train very deep models using residual connections, offering strong performance in image classification tasks. AMD has used ResNet50 on the Versal AI Engines for SEE test purposes [7]. The development and design approach were based on Petalinux OS and AMD Deep Learning Processing Unit (DPU) IP.
In our testbench, one of the A72 APU core of the processing system (PS) runs a bare-metal Resnet50 application based on a light C++ framework for CNN modeling with a part of its calculation tasks being delegated to AIEs to perform accelerated onboard inference. The main test loop of this application consists in feeding in the input data, executing lightweight ResNet50 operations, triggering the AIE graph execution and comparing output data with expected (golden) results. The acceleration graph uses 352 AIEs to accelerate the residual layers calculations, including convolutions and post-convolution additions, by performing these operations in parallel. Part of the AIEs execute dispatch kernels to manage the flow of input and output data.
Further details about the development, the design flow and the graph implementation will be presented, and some specific challenges will be discussed and illustrated by laser-testing results on individual AIEs.
References:
[1] P. Maillard et al, "Neutron and 64MeV Proton Characterization of Xilinx 7nmVersal Multicore Scalar Processing System (PS)," IEEE REDW, pp. 18-22, USA, 2022.
[2] A. Dufour et al, "Heavy-ion and proton Single Event Effect (SEE) characterization of 7nm FinFET AMD Versal," RADECS, Toulouse, France, 2023.
[3] A. Arora et al, "MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine," ICFPT, pp. 96-105, Japan, 2023.
[4] A. Leftheriotis et al "Evaluating Versal ACAP and conventional FPGA platforms for AI inference," MOCAST, Greece, 2023.
[5] S. Achaq et al, "Bottom-up Analysis of the Impact of Single Event Effects in a CNN Hardware Accelerator using Laser Testing," RADECS 2024, Canary Islands, Spain.
[6] K. He et al, "Deep Residual Learning for Image Recognition," IEEE CVPR, USA, 2016.
[7] P. Maillard et al," Protons Evaluation of 7nm Versal AI Engines (AIE) based Radiation Tolerant Platform for Deep Learning Applications, " NSREC, Ottawa, Canada, 2024.
Affiliation of author(s)
S. Achaq (1,2), V. Pouget (2), L. Artola (1), J. Boch (2)
1, ONERA, Univ. Toulouse, France. 2, IES, Univ. Montpellier, CNRS, Montpellier, France.
Track | Design Flow |
---|