
Due to harsh space environments, especially at GEO orbits populated by telecom satellites, the non-hardened microelectronics would suffer from transients and soft errors. Although, the space electronics is normally hardened and equipped with fault tolerance and soft error mitigation schemes, the radiation damage and wear out accumulating over time would eventually lead to non-tolerable permanent defects that render the target application dead. In fact, the presence of permanent defects doesn’t necessarily mean the end of life if the electronic system is aware of its health status and thereby can cope with the reduced processing capacity. The technology trend observed today in microelectronics leads to system resilience that in addition to physical hardening and classical fault tolerance , would also exploit the extensive natural redundancy of multi-core systems together with a smarter task scheduling.
In SoC-HEALTH project, Testonica is to develop a demonstrator of a fine-grain in-situ On-Chip Fault Management (OCFM) that acts as the backbone of SoC (system on chip) Health-Awareness. The SoC Health Awareness conception relies on the reuse of multitude of typical sensors and checkers already embedded deep into the hardware to measure operating parameters of the target IP core, detect and correct errors before they manifest at the application software level. The conception also includes on-chip health monitoring of the hardware, maintains SoC health map, and supports adaptation of the OS and the software to the reduced capacity of SoC sub-modules and sub-systems keeping the mission alive until the electronics is completely dead.
The new Health Awareness technology is expected to rapidly expand frontiers for real-life applications in 5-10 years. The SoC-HEALTH technology acts as a middleware between HW checkers/sensors and mission applications, thus providing Health Awareness functions for next generation multi-core processors in space industry but also in terrestrial economy, especially in the domains requiring high availability of the electronics.
The degradation monitoring as well as fault detection may take place at different system levels: the hardware (HW) and software (SW). Still, the key to a successful OCFM is the ability to simultaneously collect and process data from dozens or even hundreds of on-chip sensors and checkers in real time independent of the system size and configuration, which is not a trivial task.
The key target of the project is building on Kintex7 FPGA a demonstration platform based on octa-core LEON3 CPU augmented with OCFM facilities as well as enhancement of Linux OS/kernel with fault management functionality. With that, Testonica aims at demonstrating that monitoring, diagnostic and fault management functions based on underlying IEEE 1687 infrastructure and instrumentation can both deliver high-performance fault management as well as create a reasonable overhead impact on the underlying system performance and cost.
Project Fact Sheet
- Budget: 200 kEUR
- Duration: August 2018 - February 2020
- Partners: European Space Agency ESTEC (end user), Testonica Lab (contractor)
- Contract Number: Incentive Scheme / EXPRO+ and GSTP activity 4000124897/18/NL/CBi
- Project page: https://testonica.com/research/SoC-Health
- Final Report download: 4000124897_SoC_HEALTH_FR.pdf
