Speaker
Description
Reliability in digital circuits that operate in radiation prone environments is achieved with significant cost increase. The classical generic solution based on triple modular redundancy triples the cost.
In this work, we present a novel perspective of enhancing the fault tolerance of digital circuits, by employing iterative structures in the processing data-paths. We start by presenting a reliability analysis for LDPC decoders. These forward-error-correction circuits implement belief propagation algorithms, with messages being passed between processing nodes in multiple iterations. We show that these iterative decoders can support errors of up to 〖10〗^(-5) with negligible decrease in error correction capability.
The second part of the presentation discusses a novel approach for achieving fault tolerance in processing data-path pipelines based on control engineering feedback loops. For this purpose, we rely on a correction controller that computes and applies correction factors to a sub-set of designated registers of the processing pipeline. In order to design the correction controller, and to select the subset of registers subject to correction, we model the faulty processing data-path as a process with perturbations, for which we develop a dynamic model. Based on this model, we determined the output states, represented by registers that are compared to a reference, and the corrected states, that represent the registers for which the correction factors are added. The correction process is performed in several iterations, by rewinding the data-path computation in blocks that have non-zero correction factors. A key issues is represented by the controller needing a reference. The workaround is to employ two controlled data-paths in reaction.