Session 10a: Advances in On-Board Processing Architectures
Tracks
Day 4 - On-Board Processing Architectures
Thursday, June 17, 2021 |
1:30 PM - 2:30 PM |
Speaker
Mr. Jimmy LE RHUN
Thales Research & Technology
De-RISC: Launching RISC-V into space
1:30 PM - 1:50 PMAbstract Submission
An important challenge faced by mission-critical computers is the ability to scale the processing performance, while maintaining a high level of dependability in a harsh environment. The adoption of COTS multi-core processors, as in non-critical industries, poses difficulties both in terms of timing interference due to concurrent access to shared hardware resources, and reliability under thermal and radiation stress.
Specific dependability-related features are thus required for space computers. In this domain, the LEON processors [1] are a European success-story, adopting an open architecture, being available as open-source implementations to allow validation by a wide user base, and having fault-tolerant implementations available to support missions with high-reliability requirements. The recent RISC-V [2] open-source instruction set architecture is a great opportunity to push this concept further, with a renewed potential for growth and wide adoption.
The De-RISC project (Dependable Real-time Infrastructure for Safety-critical Computer) aims at providing the first complete processing platform for space, leveraging RISC-V cores and state-of-the-art hypervisor technology. The platform is composed of an FPGA-based SoC with high-performance NOEL-V cores [3], minimized interference channels and many space-grade peripherals. The SoC platform hosts both the XtratuM Next Generation (XNG) hypervisor [4] and LithOS guest operating system [5] for applications isolation and scheduling. In addition, the platform implements advanced monitoring techniques that help to ensure the real-time behaviour in a multicore context.
In order to validate the platform, in addition to basic benchmarks and tests, a realistic space use-case will be deployed, based on the LVCUGEN (Logiciel de Vol Charge Utile GENerique) framework [6], the CNES generic payload software based on Time & Space Partitioning. WIth the CCSDS123 hyperspectral image compression [7] as a high-throughput application and TM/TC communications as low-latency critical application, it covers a complete mixed-critical system.
The current status of the project is in line with the plans: the prototype platform is already functional and almost complete, with new features added in scheduled internal releases. The validation phase has started, and will proceed incrementally until the end of the project. The commercial release of the platform is expected for Q2 2022.
References:
[1] https://www.gaisler.com/index.php/products/processors/leon5
[2] https://riscv.org/
[3] https://www.gaisler.com/index.php/products/processors/noel-v
[4] https://fentiss.com/products/hypervisor/
[5] https://fentiss.com/products/lithos/
[6] Julien Galizzi, Jean-Jacques Metge, Paul Arberet, Eric Morand, Fabien Vigeant, et al.. LVCUGEN (TSP-based solution) and first porting feedback. Embedded Real Time Software and Systems (ERTS2012), Feb 2012, Toulouse, France.
[7] Lucana Santos Falcon, Roberto Camarero. Introduction to CCSDS compression standards and implementations offered by ESA, European Workshop on On-Board Data Processing (OBDP2019), Feb 2019, Noordwijk, Netherlands.
Specific dependability-related features are thus required for space computers. In this domain, the LEON processors [1] are a European success-story, adopting an open architecture, being available as open-source implementations to allow validation by a wide user base, and having fault-tolerant implementations available to support missions with high-reliability requirements. The recent RISC-V [2] open-source instruction set architecture is a great opportunity to push this concept further, with a renewed potential for growth and wide adoption.
The De-RISC project (Dependable Real-time Infrastructure for Safety-critical Computer) aims at providing the first complete processing platform for space, leveraging RISC-V cores and state-of-the-art hypervisor technology. The platform is composed of an FPGA-based SoC with high-performance NOEL-V cores [3], minimized interference channels and many space-grade peripherals. The SoC platform hosts both the XtratuM Next Generation (XNG) hypervisor [4] and LithOS guest operating system [5] for applications isolation and scheduling. In addition, the platform implements advanced monitoring techniques that help to ensure the real-time behaviour in a multicore context.
In order to validate the platform, in addition to basic benchmarks and tests, a realistic space use-case will be deployed, based on the LVCUGEN (Logiciel de Vol Charge Utile GENerique) framework [6], the CNES generic payload software based on Time & Space Partitioning. WIth the CCSDS123 hyperspectral image compression [7] as a high-throughput application and TM/TC communications as low-latency critical application, it covers a complete mixed-critical system.
The current status of the project is in line with the plans: the prototype platform is already functional and almost complete, with new features added in scheduled internal releases. The validation phase has started, and will proceed incrementally until the end of the project. The commercial release of the platform is expected for Q2 2022.
References:
[1] https://www.gaisler.com/index.php/products/processors/leon5
[2] https://riscv.org/
[3] https://www.gaisler.com/index.php/products/processors/noel-v
[4] https://fentiss.com/products/hypervisor/
[5] https://fentiss.com/products/lithos/
[6] Julien Galizzi, Jean-Jacques Metge, Paul Arberet, Eric Morand, Fabien Vigeant, et al.. LVCUGEN (TSP-based solution) and first porting feedback. Embedded Real Time Software and Systems (ERTS2012), Feb 2012, Toulouse, France.
[7] Lucana Santos Falcon, Roberto Camarero. Introduction to CCSDS compression standards and implementations offered by ESA, European Workshop on On-Board Data Processing (OBDP2019), Feb 2019, Noordwijk, Netherlands.
Mr. Michele Caon
Politecnico Di Torino
Low Latency On-Board Data Handling for Earth Observation Satellites using Off-the-Shelf Components
1:50 PM - 2:10 PMAbstract Submission
Satellite Earth Observation (EO) is nowadays receiving significant attention. In this regard, the latency of EO products provision to the ground segment is undoubtedly among the first key performance indicators for these systems. Traditionally, small EO satellites rely on the flight segment for raw data acquisition and compression, while the image processing tasks are performed at the ground segment. The latency of raw data transmission prevents such systems from achieving better than Near Real-Time (NRT) delivery of EO products, which are typically available to the end-user after 1h to 3h from acquisition time.
The European Union Horizon 2020 EO-ALERT project aims at significantly reducing this latency by moving all the critical processing tasks on the flight segment and accelerating them using high-performance commercial off-the-shelf (COTS) devices. The resulting architecture minimizes the amount of transmitted data and eliminates ground-based data processing from the EO data chain, hence achieving actual real-time product delivery in less than 5min with optical and Synthetic Aperture Radar (SAR) data.
The centerpiece of the proposed architecture is the embedded CPU Scheduling, Compression, Encryption, and Data Handling (CS-CEDH) Subsystem, essentially fulfilling two roles: 1) acquire and move images and products among the image processing and communications subsystems, therefore also coordinating their tasks; 2) compress and encrypt the input and output data with different settings depending on the mission requirements. From an optimal design and resource allocation perspective, these aspects are complementary: while the former is software-focused, aiming at maximizing modularity, flexibility, and dynamic scalability, required by the inherent system-level real-time event-driven nature of the CPU Scheduling processes, the latter represents an intrinsically highly-specialized, computationally expensive data-processing function, better suited for hardware implementation. In order to achieve the overall goal of minimizing the system-level latency, it is therefore mandatory to effectively co-design a mixed hardware/software solution, leveraging the performance of state-of-the-art COTS Multi-Processor System-on-Chip (MPSoC) devices featuring a Processing System (PS) directly interfaced to a Programmable Logic (PL) unit. Such platform enables, thanks to state-of-the-art Electronic Design Automation (EDA) tools, to obtain a Register-Transfer Level (RTL)model (to be deployed on the PL unit) directly from a high-level hardware-friendly software model through High-Level Synthesis (HLS), thereby effectively shifting from a software-only design to a more efficient and high-performance hybrid hardware/software one, without sacrificing run-time tuning of compression and encryption parameters and still meeting all the system-level requirements. The PS hosts the scheduling and data handling software and seamlessly off-loads data-intensive tasks to the PL, allowing to dedicate most of the CPU resources to compress, encrypt and transmit small, high-priority EO products (alerts) with very low latency while also processing larger data (e.g., generated images) with high-throughput.
Here we present the first promising results of this design methodology applied to the CS-CEDH Subsystem of the EO-ALERT architecture. In particular, its contribution to the alert provision latency is under 1s in every foreseen application scenario. Simultaneously, the hardware compression/encryption accelerator enables a 6 to 7-fold speed-up compared to a software-optimized implementation when also compressing, encrypting, and transmitting the processed images.
The European Union Horizon 2020 EO-ALERT project aims at significantly reducing this latency by moving all the critical processing tasks on the flight segment and accelerating them using high-performance commercial off-the-shelf (COTS) devices. The resulting architecture minimizes the amount of transmitted data and eliminates ground-based data processing from the EO data chain, hence achieving actual real-time product delivery in less than 5min with optical and Synthetic Aperture Radar (SAR) data.
The centerpiece of the proposed architecture is the embedded CPU Scheduling, Compression, Encryption, and Data Handling (CS-CEDH) Subsystem, essentially fulfilling two roles: 1) acquire and move images and products among the image processing and communications subsystems, therefore also coordinating their tasks; 2) compress and encrypt the input and output data with different settings depending on the mission requirements. From an optimal design and resource allocation perspective, these aspects are complementary: while the former is software-focused, aiming at maximizing modularity, flexibility, and dynamic scalability, required by the inherent system-level real-time event-driven nature of the CPU Scheduling processes, the latter represents an intrinsically highly-specialized, computationally expensive data-processing function, better suited for hardware implementation. In order to achieve the overall goal of minimizing the system-level latency, it is therefore mandatory to effectively co-design a mixed hardware/software solution, leveraging the performance of state-of-the-art COTS Multi-Processor System-on-Chip (MPSoC) devices featuring a Processing System (PS) directly interfaced to a Programmable Logic (PL) unit. Such platform enables, thanks to state-of-the-art Electronic Design Automation (EDA) tools, to obtain a Register-Transfer Level (RTL)model (to be deployed on the PL unit) directly from a high-level hardware-friendly software model through High-Level Synthesis (HLS), thereby effectively shifting from a software-only design to a more efficient and high-performance hybrid hardware/software one, without sacrificing run-time tuning of compression and encryption parameters and still meeting all the system-level requirements. The PS hosts the scheduling and data handling software and seamlessly off-loads data-intensive tasks to the PL, allowing to dedicate most of the CPU resources to compress, encrypt and transmit small, high-priority EO products (alerts) with very low latency while also processing larger data (e.g., generated images) with high-throughput.
Here we present the first promising results of this design methodology applied to the CS-CEDH Subsystem of the EO-ALERT architecture. In particular, its contribution to the alert provision latency is under 1s in every foreseen application scenario. Simultaneously, the hardware compression/encryption accelerator enables a 6 to 7-fold speed-up compared to a software-optimized implementation when also compressing, encrypting, and transmitting the processed images.
Dr. Pablo Ghiglino
Klepsydra Technologies GmbH
Lock-free pipelining for onboard data processing - A low power and high throughput alternative to OpenMP parallelisation for processor intensive tasks.
2:10 PM - 2:30 PMAbstract Submission
Onboard computers are expected to perform more and more processor intensive tasks autonomously. Examples can be found in a broad number of applications: efficient data compression for Earth Observation data, system anomaly and fault detection, data encryption, sensor data fusion, etc. More over, the increasing interest in Artificial intelligence (AI) from the Space Industry has only exacerbated the problem. Vision based navigation in Space, Earth Observation data validation in Space, signal to noise ratio (SNR) reduction are only some examples of the need of AI in Space.
The Space Industry, as many other industries, has focused exclusively in the use of standard parallelisation techniques for the acceleration of processor intensive algorithms. Parallelisation tools like OpenMP are used to break down a specific operation within the algorithm, e.g. matrix multiplication, into smaller pieces that are processed using several parallel threads. This technique ensures a low processing time of the algorithm. However, it has limited control of CPU usage and moreover it lacks control of the data throughput.
In this work the authors propose a well known paradigm to data processing that has recently been rising renewed interest in Academia: pipelining. Pipelining, as opposed to parallelisation, breaks down an algorithm in steps or operation and assign threads to each of these steps. Pipelining is somehow an an assembly line, where each thread is in charge of one specific piece of the product. Academia research shows significant improvements in the overall performance of the algorithm.
Moreover, the authors of this paper made use of a lock-free data processing framework to implement a novel pipeline framework that can enhance onboard processing performance even more. This lead to ground-breaking results in terms of CPU usage and data throughput: more than 20%-60% CPU reduction and 2x - 8x data processing rate increase.
Two validation experiments are presented by the authors. First, an algorithms consisting on a number of matrix multiplications to be performed sequentially. And secondly, an AI network for Asteroid pose estimation using data from past Space mission. Both experiments where benchmarked and compared with OpenMP, for the matrix multiplication example, and with TensorFlowLite for the AI algorithm, In both instances, the here presented lock-free pipelined approach substantially outperformed both OpenMP and TensorFlowLite both in terms of CPU and data throughput.
The Space Industry, as many other industries, has focused exclusively in the use of standard parallelisation techniques for the acceleration of processor intensive algorithms. Parallelisation tools like OpenMP are used to break down a specific operation within the algorithm, e.g. matrix multiplication, into smaller pieces that are processed using several parallel threads. This technique ensures a low processing time of the algorithm. However, it has limited control of CPU usage and moreover it lacks control of the data throughput.
In this work the authors propose a well known paradigm to data processing that has recently been rising renewed interest in Academia: pipelining. Pipelining, as opposed to parallelisation, breaks down an algorithm in steps or operation and assign threads to each of these steps. Pipelining is somehow an an assembly line, where each thread is in charge of one specific piece of the product. Academia research shows significant improvements in the overall performance of the algorithm.
Moreover, the authors of this paper made use of a lock-free data processing framework to implement a novel pipeline framework that can enhance onboard processing performance even more. This lead to ground-breaking results in terms of CPU usage and data throughput: more than 20%-60% CPU reduction and 2x - 8x data processing rate increase.
Two validation experiments are presented by the authors. First, an algorithms consisting on a number of matrix multiplications to be performed sequentially. And secondly, an AI network for Asteroid pose estimation using data from past Space mission. Both experiments where benchmarked and compared with OpenMP, for the matrix multiplication example, and with TensorFlowLite for the AI algorithm, In both instances, the here presented lock-free pipelined approach substantially outperformed both OpenMP and TensorFlowLite both in terms of CPU and data throughput.
Session Chairs
Thomas Firchau
Dlr
Patricia Lopez Cueva
Thales Alenia Space France