Session 6a: AI Inference Frameworks and Acceleration on Space Devices
Tracks
Day 2 - On-Board Processing Benchmarks and AI Acceleration
Tuesday, June 15, 2021 |
4:20 PM - 4:40 PM |
Speaker
Dr. Leonidas Kosmidis
Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC)
Reliable Machine Learning Acceleration for Future Space Processors and FPGAs: LEON, NOEL-V and TASTE
4:20 PM - 4:40 PMAbstract Submission
Recently, there is an increasing interest in artificial intelligence(AI) and machine learning(ML) in space for on-board processing [1], as indicated by latest missions. Perseverance uses AI for navigation, however the limited processing capabilities of existing space processors permit only small navigation distances and speed. Therefore, COTS accelerators are explored such as Intel Movidius in ESA’s Φ-1 mission, where AI is used for detecting clouds in satellite earth images. However, due to the low radiation tolerance of COTS accelerators, these solutions cannot be used beyond LEO or for long term missions such as institutional ones. Moreover, COTS software stacks used in accelerators which frequently depend on non-space qualified OSes such as linux.
Therefore, ML acceleration features are needed in qualified space processors and FPGAs. We provide two such solutions, one for increasing the AI performance of space processors using a low-cost vector unit co-designed for AI, and another one implementing state-of-the-art low-cost binarised neural networks(BNN) on FPGAs, using ESA’s TASTE framework for a reliable software stack. Both solutions are implemented in VHDL and are open-source.
Our vector unit is portable and has been designed for both LEON3 and NOEL-V space processors. It supports 8-bit operations with optional saturation and reduction operations, and reuses the existing register file, which guarantees minimum overhead and backwards compatibility. The integration of the module with the smallest LEON3 implementation, according to Vivado shows an area increase of only 30% which is minimal compared to other vector units for microcontrollers [2]. The impact in the processor frequency is minimal, achieving 90MHz compared to 100 MHz of the baseline LEON3. Preliminary results using matrix multiplication, a universal ML building block used for the implementation of fully connected and convolutional layers, indicate speedups of up to 3.8x. Currently we are working on compiler support to enable evaluation with more relevant ML cases. We expect to have these results by the workshop date, for presentation.
For the BNN accelerator, we use the TASTE model-based framework to generate a correct-by-construction software communication driver on the CPU, as well as the VHDL communication part of the accelerator, based on the ASN.1 description of the data we want to run inference on. The implementation of the BNN logic is performed manually in VHDL with a reusable, structured design for fully connected BNN layers. Thanks to the BNN properties, such operations are ideal for FPGAs, since multiply-and-accumulate operations are reduced to XNOR and bit-counting. The layer weights are stored within the FPGA block ram, so they are re-used across runs. Synthesis results report an achieved frequency of 114.943MHz. Preliminary simulation results with a fully connected 512x512 layer show performance benefits between one and two orders of magnitude compared to LEON3.
Both designs are currently ported to Xilinx FPGAs to obtain higher confidence results, which will be presented in the workshop.
[1] Jan-Gerd Me et al. Techniques of Artificial Intelligence for Space Applications - A Survey. In OBDP 2019.
[2] M. Johns et al. A Minimal RISC-V Vector Processor for Embedded Systems. In FDL, 2020.
Therefore, ML acceleration features are needed in qualified space processors and FPGAs. We provide two such solutions, one for increasing the AI performance of space processors using a low-cost vector unit co-designed for AI, and another one implementing state-of-the-art low-cost binarised neural networks(BNN) on FPGAs, using ESA’s TASTE framework for a reliable software stack. Both solutions are implemented in VHDL and are open-source.
Our vector unit is portable and has been designed for both LEON3 and NOEL-V space processors. It supports 8-bit operations with optional saturation and reduction operations, and reuses the existing register file, which guarantees minimum overhead and backwards compatibility. The integration of the module with the smallest LEON3 implementation, according to Vivado shows an area increase of only 30% which is minimal compared to other vector units for microcontrollers [2]. The impact in the processor frequency is minimal, achieving 90MHz compared to 100 MHz of the baseline LEON3. Preliminary results using matrix multiplication, a universal ML building block used for the implementation of fully connected and convolutional layers, indicate speedups of up to 3.8x. Currently we are working on compiler support to enable evaluation with more relevant ML cases. We expect to have these results by the workshop date, for presentation.
For the BNN accelerator, we use the TASTE model-based framework to generate a correct-by-construction software communication driver on the CPU, as well as the VHDL communication part of the accelerator, based on the ASN.1 description of the data we want to run inference on. The implementation of the BNN logic is performed manually in VHDL with a reusable, structured design for fully connected BNN layers. Thanks to the BNN properties, such operations are ideal for FPGAs, since multiply-and-accumulate operations are reduced to XNOR and bit-counting. The layer weights are stored within the FPGA block ram, so they are re-used across runs. Synthesis results report an achieved frequency of 114.943MHz. Preliminary simulation results with a fully connected 512x512 layer show performance benefits between one and two orders of magnitude compared to LEON3.
Both designs are currently ported to Xilinx FPGAs to obtain higher confidence results, which will be presented in the workshop.
[1] Jan-Gerd Me et al. Techniques of Artificial Intelligence for Space Applications - A Survey. In OBDP 2019.
[2] M. Johns et al. A Minimal RISC-V Vector Processor for Embedded Systems. In FDL, 2020.
Session Chairs
Enrico Magli
Politecnico Di Torino
David Steenari
Esa