Abstract:
With its growth in spacecraft control applications, the microcontroller (MCU) becomes increasingly sensitive to radiation and the risks of system failure. In a radiation environment, the MCU is vulnerable to impacts from high-energy particles, which can lead to single-event effect (SEE) that disrupt normal system operations. The pipeline of MCU, being the core structure of the system, is particularly susceptible to single-event upset (SEU) and potentially causes execution failures. However, existing radiation-hardening techniques offer limited effectiveness for pipelines. To enhance SEU resistance, this study focused on a 32-bit MCU core with eight pipeline stages, proposing a pipeline hardening approach that utilizes lockstep technology to improve fault tolerance. Signals from two processors were compared including register write data, register contents and pre-fetched instructions. Any discrepancies triggered error flags to indicate faults. When an error flag was raised, recovery was initiated through an interrupt. The interrupt handler then retrieved state information from the advanced peripheral bus (APB) slave module to restore the CPU’s operational state and resume execution. By combining hardware-based state preservation with software-driven error recovery, the proposed solution demonstrated significant improvements in fault tolerance rates and performance compared to traditional checkpoint-based techniques. After completing the pipeline hardening design, a fault injection platform was utilized in this paper to simulate real-world error conditions on internal processor modules. The platform was developed based on the circuit’s register-transfer-level (RTL) design and statistical results. The fault injection platform was performed by automatically finding all registers within the target design. The register values were forced to upset at the tens of nanoseconds scale in the RTL description of the circuit’s design. After running the circuit’s functional simulation, the statistics of the faults in registers were displayed on the platform, which evaluated the influence of SEU. The vulnerability of SEU in the circuit could be observed from the results of the soft error statistics. The post-hardening soft error rates were then measured and compared to pre-hardening data, providing a quantitative evaluation of the improvements. Using this method, the soft error rates of the modules in the MCU core such as PFU, DPU, and Cache AXIM are 40.07%, 26.36%, and 27.29% respectively before hardening. The soft error rates of modules mentioned above are reduced to 0%, 0.69%, and 1.11% after hardening. The hardened and non-hardened designs of the entire core were implemented in FPGA. The total resource utilization of the triple mode redundancy (TMR) is 111 984, as indicated by the number of look-up tables (LUTs) and registers consumed in the FPGA. The total resource utilization of this work is 78 034, and the ratio of resource utilization between this work and TMR is approximately 69.68%. The error recovery time for the hardened MCU processor was analyzed using the completion cycles of a bubble sort algorithm as a benchmark. In this paper, the average recovery cycle using the software checkpoint roll-back method is 36 479.06, and the average recovery cycle using this work is 26 922.5. The ratio of recovery cycles between this work and checkpoint roll-back is about 73.8%. Assessments through random fault injection and FPGA implementation indicate that this approach effectively reduces processor faults caused by soft errors while optimizing resource utilization and efficiency over triple-modular redundancy.