Header image

Session 5a: Evaluation and Benchmarks of Processing Devices and Systems

Tracks
Day 2 - On-Board Processing Benchmarks and AI Acceleration
Tuesday, June 15, 2021
1:30 PM - 2:50 PM

Speaker

Mr Antoine Certain
Airbus D&s

HP4S: High Performance Parallel Payload Processing for Space

1:30 PM - 1:50 PM

Abstract Submission

Next generation space missions will require more capable computers in order to implement either advanced navigation and control algorithms needed to increase the spacecraft autonomy and agility or on the payload side with complex scientific payload data pre-processing algorithms.
There is therefore a high interest for building efficient and disruptive on-board computing for future applications, based on the exploitation of multicore and manycores to increase on-board processing capability while sustaining flexibility through the use of software.
The purpose of this study was to demonstrate the benefits of using one of the most well-known parallel OpenMP programming model for the development of parallel space applications, in terms of performance, programmability and portability.
Two main goals were identified:
• Improve overall system performance by exploiting the most advanced parallel embedded architectures targeting the space domain
• Improve the parallel programming productivity by reducing the initial development efforts of systems based on parallel architectures,
During the project two representative image processing use cases were selected and successfully ported to OpenMP parallel programming model with very limited effort and no modification of actual algorithmic legacy code.
Two promising high-end computing devices were selected targeting rad-hard family with the GR740 and COTS family with latest Kalray Massively Parallel Processing Architecture device, Coolidge, released end of Q1 2020.
OpenMP runtime and Open Source observability tools provided by Barcelona Supercomputing Center were ported from HPC mainstream to the selected hardware targets, and exercised through the selected software use cases.
The project demonstrated that usage of OpenMP parallel programming could facilitate the development, and analysis of parallel real-time space applications. Development benefits from embedded runtime thus alleviating the programmer of fine parallelization orchestration burden thanks to non-intrusive and portable source code annotations.
The evaluated OpenMP parallel programming model ported to relevant hardware targets accelerates the development, profiling, analysis and execution of parallel real-time space applications, while providing significant performance and portability benefits.
Some questions remain open for future work amongst which:
• Evaluation of state-of-the-art compiler techniques to guarantee that parallel OpenMP applications are functionally correct and safe, as an example with the absence of pathological race conditions or deadlocks.
• Adaptation of the OpenMP runtime libraries to ensure that the timing guarantees devised at analysis time can be guaranteed at deployment time and continue maturing the prototyped instrumentation and observation tools.
• Exploration of complementary OpenMP features such as offloading to FPGA or other remote computing devices such as neighbor clusters on a ManyCore or specialized accelerators.

Presentation_PDF

Dr. Leonidas Kosmidis
Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC)

GPU4S (GPUs for Space): Are we there yet?

1:50 PM - 2:10 PM

Abstract Submission

In this contribution, we provide an overview of the results and lessons learnt from the on-going ESA-funded GPU4s project (GPU for Space) performed by the BSC as a prime and ADS as subcontractor. Embedded GPUs can provide significant computational power at a low-power for large amounts of data, allowing the use of software for on-board processing. They allow more flexibility, easier reconfiguration compared to FPGAs and can support several different processing tasks through reuse of compute resources. Moreover, they can leverage an abundance of specialised developers, familiar with widely-used programming models, resulting in an overall lower cost.
The purpose of this exploratory project is to address the increased needs for on-board processing performance of future missions, exploring the possibility of using embedded GPUs in space and studying the initial steps required for their adoption. In particular, our goal is the evaluation of GPU IP for possible future space processors as well as the evaluation of COTS GPUs.
We performed a survey of existing and future algorithms used in space across all divisions of ADS, to identify which domains expect higher needs for performance and whether their algorithms have good characteristics for GPU parallelisation. We concluded that most space algorithms are a good fit for the GPU programming model, something we confirmed also experimentally later.
In another survey, we studied the available hardware solutions and their software ecosystem. We focused on embedded GPU IPs from European providers, to identify the most appropriate one for a radiation-hardened implementation in an ASIC or FPGA in the long term. Additionally, we covered the most important embedded COTS GPU solutions, to identify the most appropriate one for lower cost, short-term adoption. We also expanded our survey to open source IP and GPU-like solutions. From this extensive coverage we selected to benchmark a set of embedded GPUs.
For this, we have defined GPU4S Bench [1], an open source embedded GPU benchmarking suite, consisting of algorithmic building blocks from multiple space domains, identified in our space survey. GPU4S Bench provides also the basis and optimised implementations of these algorithms for GPUs and Multi-core CPUs used in ESA’s OBPMark, an open source benchmarking suite for general on-board processing devices.
In addition to these benchmarks, we ported complex space applications, such as the Euclid NIR, the image processing and CCSDS compression benchmarks from OBPMark, demonstrating that GPUs can benefit significantly existing and mainly future space processing, in terms of performance and power consumption, including efficiency. In our contribution we will present a summary of the obtained results.
Finally, we identified issues such as radiation effect mitigation, thermal management and procurement of GPU devices, which need to be addressed for the adoption of GPUs in space, we proposed potential solutions and defined a roadmap.
Overall, our conclusion is that embedded GPUs have a high potential for providing the performance needs of future missions, and can significantly reduce the cost, while offering new capabilities.

[1] GPU4S Bench: Design and Implementation of an Open GPU Benchmarking Suite for Space On-board Processing: https://www.ac.upc.edu/app/research-reports/public/html/research_center_index-CAP-2019,en.html

Presentation_PDF

Mr. David Steenari
Esa

OBPMark (On-Board Processing Benchmarks) – Open Source Computational Performance Benchmarks for Space Applications

2:10 PM - 2:30 PM

Abstract Submission

Computational benchmarking of on-board processing performance for space applications has often been done in a case-to-case basis, taking into account only a small subset of devices and specific, often proprietary, applications, limiting domain coverage and reproducibility. While commercial benchmarks exists for embedded systems, they are usually limited to CPUs and are based on synthetic algorithms non-relevant for space. Consequently, they are not generally suitable for assessing highly parallel processors (GPUs, DSPs, etc.) and/or hardware implementations (i.e. ASICs and FPGAs) which are commonplace in space systems.

For on-board processing, there are a number of application types which reoccur over multiple missions. These applications and algorithms are often driving the overall computational requirements of the mission, e.g. in the case of image and radar processing, RF signal processing and compression. In each case, there are certain performance metrics – such as the number of pixels processed per second – which are well-known and easily understandable by designers and users. Finally, with the rise of machine learning applications in on-board space applications, tasks such as image classification and object detection using SVMs and CNNs are becoming commonly used.

OBPMark (On-Board Processing Benchmarks) defines a set of benchmarks covering the typical classes of applications commonly found on-board spacecraft. The benchmark suite is publicly available to enable easy comparison of different systems and to quickly down-select possible processing solutions for a mission. It is open source and includes multiple implementations, while it is easily extensible allowing porting and optimization to target platforms, including heterogeneous ones, for fair comparison. Currently, implementations in standard C, OpenMP, OpenCL and CUDA are included.

A technical note, defining the algorithms used is also provided to allow implementers to provide additional dedicated versions, including reference inputs and outputs for correctness verification as well as an optional automated launching framework for reproducibility. This also allows the benchmarks to be implemented in FPGAs, while ensuring equivalence with the reference implementations.

Five categories of benchmarks are defined 1) Image Processing Pipelines; 2) Standard Compression Algorithms; 3) Standard Encryption Algorithms; 4) Processing Building Blocks; and 5) Machine Learning Inference. In each category, specific benchmarks are included, e.g. both image and radar image compression. Recommended parameters for the CCSDS compression standards 121.0, 122.0 and 123.0 are provided. The processing building blocks include e.g. FIR filters and FFT processing. Two ML applications have been chosen: cloud screening and ship detection. Both will be provided as standard pre-trained machine learning models, both floating point and quantized integer models – to allow support for multiple microarchitectures.

The specification of OBPMark has been initiated by ESA together with BSC as an open source project to allow transparent and open performance comparison of devices and systems. The project will also maintain a list of available benchmark results on its open repository.

The work has been carried out both internally at ESA, and at BSC through the on-going ESA-funded GPU4S activity, whose optimised versions of algorithmic building blocks implemented in the open source GPU4S Bench benchmarking suite were used as a basis.
Mr. Max Ghiglione
Airbus Defence & Space

Machine Learning Application Benchmark for satellite on-board data processing

2:30 PM - 2:50 PM

Abstract Submission

Machine Learning applications are finding their ways in demonstration missions like ESA's Φ-sat, which were enabled by the use of COTS solutions and the improvement of tools for ML deployment on radiation-tolerant processing units. The satellite industry has been looking at these developments with great interest.
The challenge of implementing such ML applications lies mainly on three points: 1) There are limited processing capabilities on spacecraft hardware, meaning that algorithms need to be optimized for their embedded application. 2) This poses challenges also in terms of tools to be used in the development flow, as classical GPU inference is not possible, and the integration in the industry workflow is complex. 3) The available datasets in terms of openly accessibility and reusability for space missions are limited, as data is either proprietary or poorly labeled.
To address these challenges, a benchmark for ML inference applications in space is proposed.
Such a method would simplify the comparison of algorithms in early development phases, enabling engineers to define necessary processing power for the desired applications. Moreover, appropriate benchmarking suites will enable the investigation of the software tools, various custom reconfigurable IP designs, and COTS solutions for ML inference for on-board data processing.
In the frame of the MLAB project, Airbus, TU Munich, and OroraTech are working on developing an ML inference benchmark based on the commercial MLPerf method.
In this work, we specifically focus on the description of this benchmark as the main part of MLAB project, and discuss initial findings and directions with respect to the datasets and tools. The benchmark intends to cover diverse set of algorithms including feature extraction, object detection, classification, tracking, and change detection of different complexity. This ensures that various space use-cases and different computational complexities are represented in benchmarks.
The benchmarking suite relies on publicly available large-scale standardized datasets to ensure the reproducibility of results. Specifically, the datasets that are published in recent years, including BigEarth, Kaggle Ship Dataset and EuroSAT, satisfy this requirement and are promising for the benchmarks.
In addition, the benchmark is intended to cover various ML inference development and deployment tools. Due to the fact that optimization plays an important part in the final inference performance, tool choice is crucial to meet the requirements with the least amount of design effort. For this reason, inference tools, like FINN, Vitis AI, hls4ml, and more, featuring different levels of HW design, have to be part of the benchmarking procedure.
Additionally, this benchmark covers the general performance metrics associated with the payload applications that use ML inference, including accuracy rates, throughput, model complexity, computational complexity, resource utilization, and memory footprint. On the other hand, the special requirements of on-board computation require more careful considerations and implementation of space-specific merits such as energy efficiency, tail-latency bounds, which will also be addressed in this benchmark.
The described benchmark will represent a crucial tool for the adoption and deployment of ML-based applications in next-generation on-board data processing for future spacecraft missions.

Presentation_PDF


Session Chairs

Mickaël BRUNO
CNES

Clément Coggiola
CNES

loading