• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (9): 1691-1699.

• 人工智能与数据挖掘 • 上一篇    下一篇

一种针对固定故障的忆阻神经网络容错方案

程其宏1,刘鹏1,姚廉1,尤志强2,武继刚1   

  1. (1.广东工业大学计算机学院,广东 广州 510006;2.湖南大学信息科学与工程学院,湖南 长沙 410082)
  • 收稿日期:2024-11-04 修回日期:2024-12-05 出版日期:2025-09-25 发布日期:2025-09-22
  • 基金资助:
    国家自然科学基金(62174038,62374047)

A fault tolerance scheme for memristive neural network under stuck-at faults

CHENG Qihong1,LIU Peng1,YAO Lian1,YOU Zhiqiang2,WU Jigang1   

  1. (1.School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006;
    2.College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China)
  • Received:2024-11-04 Revised:2024-12-05 Online:2025-09-25 Published:2025-09-22

摘要: 电阻式随机存取存储器RRAM以其非易失性、低延时等特性,在高效实现向量矩阵乘法运算的同时避免了大量的数据传输,因此在加速神经网络计算方面表现出巨大的潜力。然而,固定故障SAF会导致基于RRAM的神经网络的推理精度严重下降。提出了一种针对SAF的容错方案,包括权重映射变化、权重范围变化和损失函数正则化等方法,以尽可能减小由SAF引入的权重偏差。通过在不同神经网络上应用图像识别任务进行综合评估,实验结果表明,所提出的容错方案能够有效恢复由SAF造成的精度损失,即使在10%SAF的条件下,平均精度损失不超过1.5%。

关键词: 忆阻器, 神经网络, 固定故障, 容错计算

Abstract: Resistive random access memory (RRAM) exhibits enormous potential in accelerating neural network computations due to its characteristics such as non-volatility and low latency. It can efficiently implement vector-matrix multiplication operations while avoiding massive data transmission. However, stuck-at faults (SAFs) can lead to a significant degradation in the inference accuracy of RRAM-based neural networks. This paper proposes a fault-tolerant scheme for SAFs, which includes methods such as weight mapping adjustment, weight range modification, and loss function regularization, aiming to minimize the weight deviations introduced by SAFs. Comprehensive evaluations through applying image recognition tasks on different neural networks show that the proposed fault-tolerant scheme can effectively recover the accuracy loss caused by SAFs. Even under the condition of 10% SAFs, the average accuracy loss does not exceed 1.5%.

Key words: memristor, neural network, stuck-at fault, fault-tolerant computing