OBDP 2021 - Session 3: On-Board Processing Algorithms and Implementations

Session 3: On-Board Processing Algorithms and Implementations

Tracks

Day 1 - Onboard processing applications

Monday, June 14, 2021

3:45 PM - 4:45 PM

Speaker

Prof. Enrico Magli

Politecnico Di Torino

COMPRESSIVE IMAGING AND DEEP LEARNING BASED IMAGE RECONSTRUCTION METHODS IN THE “SURPRISE” EU PROJECT

3:45 PM - 4:05 PM

Abstract Submission

The Horizon 2020-funded SURPRISE project targets the development and demonstration of a geostationary compressive optical imaging instrument in the visible, NIR and MIR spectral bands. The instrument leverages the compressed sensing paradigm in several ways: first, the aim is to perform native data compression and encryption, thereby avoiding the need of dedicated hardware; second, the instrument targets super-resolution reconstruction, achieving a number of spatially resolved pixels that is larger than the number of detectors; third, the instrument is intended to have the capability to perform onboard analysis of the acquired scenes in order to detect events of interest.

This paper presents an overview of the SURPRISE instrument concept and the related demonstrator, and an in-depth analysis of the related reconstruction algorithms. Specifically, we focus on the super-resolution imaging approach, and leave the discussion of encryption and onboard analysis for subsequent papers. Indeed, while compressed sensing has traditionally employed sparsity-based methods to recover the images from a set of random projections, the performance of these methods has often been unsatisfactory. A new generation of deep learning methods has shown that it is possible to learn strong priors from training data, obtaining reconstruction accuracy far better than that exhibited by the conventional methods.
This paper reports on the performance and the features of a deep learning method selected for image reconstruction. The method is based on the ISTA-NET+ neural network, which has suitably generalized in order to match the optical design of the SURPRISE instrument and the related functional requirements. ISTA-NET+ mimics a few unrolled iterations of an iterative shrinkage method. Opposite to traditional methods, which employ a given fixed prior for reconstruction, e.g. sparsity or total variation, the deep learning approach learns the domain that is more suitable for image reconstruction from a training set.

We show results of image sensing and reconstruction in a variety of conditions, and discuss the impact of micro-mirror type, model noise, amount of super-resolution. Current results on Earth observation images indicate that deep learning methods largely outperform existing methods. We also discuss the ability of the deep learning model to reconstruct images whose type is different from those employed during the training stage. Moreover, while traditional methods are typically iterative and therefore computationally intensive, deep learning methods are much simpler and can be trained to be robust to noise. In the final paper we will also include results on simulated images having the same optical characteristics as those acquired via the demonstrator, which is currently in its advanced design stage.

Dr. Domenik Helms

OFFIS - Institut für Informatik

A novel tool box, automating the FPGA design of an ultra-low power, low latency block-memory free implementation of a 1-dimensional stream processing CNN.

4:05 PM - 4:25 PM

Abstract Submission

The timing and power of an embedded neural network application is usually dominated by the access time and the energy cost per memory access. From a technical point of view, the hundreds of thousands of look-up tables (LUT) of a field programmable gate array (FPGA) circuit are nothing more than small but fast and energy-efficiently accessible memory blocks. If the accesses to the block memory can be reduced or, as in our case, avoided altogether, the resulting neural network would compute much faster and with far lower energy costs.
We have therefore developed a design scheme that uses precomputed convolutions and stores them in the LUT memories. This allows small (mostly one-dimensional) convolutional neural networks (CNN) to be executed without block memory accesses: Activations are stored in the local per LUT registers and the weights and biases of all neurons are encoded in the lookup tables. Each neuron is assigned its exclusive share of logic circuits. This completely avoids the need for memory accesses to reconfigure a neuron with new weights and allows us to perform weight optimisations at design time. However, it limits the applicability of the overall method to comparatively small neural networks, since we need several LUTs per neuron and even the largest FPGAs only provide hundreds of thousands of LUTs.
To make this "in LUT processing" possible, we had to limit the set of available neural network functions. We have identified and implemented a set of functions that are sufficient to make the neural network work, but which can all be implemented efficiently in an FPGA without memory access. Our philosophy is that it is better to adapt the FPGA during training to make the best use of the limited resources available than to try to optimise the functions in hardware, resulting from a non-limited neural network.
To make this design scheme usable, we had to develop a set of design tools, helping the AI designer to convert a given reference AI in TensorFlow into an equivalent network of the available hardware functions and to finetune the AI to help to compensate the accuracy loss from changing the implementation. The two most powerful optimization techniques we applied are a variable bitwidht quantization and a depth-wise separation of the convolution.
In order to demonstrate the performance of this method, we implemented a CNN based ECG detection. Our implementation only used 40% of the available LUTs on the Spartan S15 chip and none of the blockram or DSP circuits. The system processed 500 pre-recorded ECGs of 5575 samples in 281ms, using only 73mJ in total, resulting in 10 million samples per second and an energy cost of 26.2nJ per sample.

Mr. Gogu Dragos-Georgel

GMV innovating solutions

Boosting Autonomous Navigation solution based on Deep Learning using new rad-tol Kintex Ultrascale FPGA

4:25 PM - 4:45 PM

Abstract Submission

D. Fortun Sanchez, D. Gonzalez-Arjona, F. Stancu, D. Gogu
GMV Innovating Solutions S.R.L, Bucharest, Romania
daniel.fortun.sanchez@gmv.com, {dgarjona, fstncu, dgogu } @gmv.com
O. Müler, M. Barbelian
octavian.grigore-muler@upb.ro, barbelian_m@avianet.ro
UNIVERSITY POLITEHNICA OF BUCHAREST

Boosting Autonomous Navigation solution based on Deep Learning using new rad-tol Kintex Ultrascale FPGA
Abstract

In this paper we present an ad-hoc architecture for on-board Deep Learning (DL) network implemented onto rad-tol FPGA, creating building blocks re-usable for different AI solutions or networks to be developed in the future.
The problem analysed is based on autonomous descent and landing scenario, trying to compare traditional techniques. The implementation of the Deep Learning algorithm is focused on extraction of features in navigation camera images. The solution in FPGA allows a reduced power consumption while maximizing the execution performance, opposite to some, or many, on-ground solutions. A space representative breadboard is prepared to demonstrate the solution.
For training/testing/validating of the Deep Learning (DL) it has been selected the North Pole of the Moon surface, where one trajectory is used to train the DL and a second one to validate the DL.
The architecture of the neural network is divided in numerous layers and it is based on Processing Units (PU), where on layer can have multiple PU. Each PU has the possibility to perform operations such as Convolution, MaxPooling and Upsampling.
The implementation is compose by a set of PUs, three DSPs and a controller. The controller is responsible of the coordination of reading and writing commands into the external memory, as well as deciding which operation to execute.
Whereas lower address of the memory is allocated for storage the parameters necessary for the operations (weights and biases), once the FPGA is initialized, the following memory addresses is reserved for the input images. Moreover, each layer has a fraction of the RAM reserved for its.
Autonomous Navigation Based is a complex DL implementation and presents a lot of challenges, but two of them are of outmost importance: FPGA resources and timing performances. Both terms are completely correlated and the design is tailored in order to balance the bottleneck, the output performance based on timing requirements and the number of accesses to external memory.
This architecture requires of a huge amount of arithmetical resources and there are not enough resources available, such as DSPs and BRAMs that help performing all operations in a row, not even in the biggest space grade FPGA on the market.
The liaison between resources and timing is to find the balance between both. The architecture based on PUs tries to minimize the latency of the operations, especially for the Convolution module, while at the same time minimizing resources. The more PUs, less accesses to external memory are needed.
Overall, this paper presents an analysis on the performance of a DL, where visual based navigation algorithms are implemented on a FPGA hardware. The main goal of this activity is to reduce the computational load of on-board processors, starting from architecture, resources and timing analysis between SW and HW implementation.

Presentation_PDF

Session Chairs

Roberto Camarero

Esa