• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (9): 1700-1710.

• 人工智能与数据挖掘 • 上一篇    

基于MMD-GA的深度学习测试集优化约简

王凤英1,2,宋子凯2,张岩1,杜利明1   

  1. (1.宿迁学院信息工程学院,江苏 宿迁 223800;2.沈阳建筑大学计算机科学与工程学院,辽宁 沈阳 110168)

  • 收稿日期:2023-11-21 修回日期:2024-05-09 出版日期:2025-09-25 发布日期:2025-09-22
  • 基金资助:
    江苏省产学研合作项目(BY20231232);宿迁学院人才引进科研启动基金(校2022XRC091) 

Optimization and reduction for deep learning test set based on MMD-GA

WANG Fengying1,2,SONG Zikai2,ZHANG Yan1,DU Liming1   

  1. (1.School of Information Engineering,Suqian University,Suqian 223800;
    2.School of Computer Science and Engineering,Shenyang Jianzhu University,Shenyang 110168,China)
  • Received:2023-11-21 Revised:2024-05-09 Online:2025-09-25 Published:2025-09-22

摘要: 在图像识别领域,测试用例冗余且标记标签仍需人工操作,对测试用例进行优化是解决测试代价高昂、测试效率低下的有效方法。基于此,提出一种基于进化算法的测试用例优化约简方法—ERIR,使用深度神经网络模型提取图像特征,代入HDBSCAN聚类算法分析原始测试集数据分布,在聚类结果的基础上以最小化测试子集与原始分布为目标设计进化算法。提出了基于最大均值差异与遗传算法融合的测试用例挑选算法—MMD-GA,能够在每个聚类簇中挑选出最具有代表性的原型构成测试子集。应用该算法在CNN结构和Transformer结构模型上进行了大量实验,结果显示挑选出的测试输入在提升时间效率的基础上保证了准确率接近原始测试集,对比整体测试集准确率平均误差在0.18%~2.32%。

关键词: 测试用例约简, 深度学习, 图像识别, 遗传算法, 软件测试

Abstract: In the field of image recognition, test cases are redundant and labeling still requires manual operation. Optimizing test cases is an effective way to solve the problems of high testing costs and low testing efficiency. Based on this, a test case optimization and reduction method based on evolutionary algorithm, named ERIR, is proposed. It uses a deep neural network model to extract image features, which are then substituted into the HDBSCAN clustering algorithm  to analyze the data distribution of the original test set. On the basis of clustering results, an evolutionary algorithm is designed with the goal of minimizing the difference between the test subset and the original distribution. A test case selection method combining maximum mean discrepancy and genetic algorithm, named MMD-GA, is proposed, which can select the most representative prototypes from each cluster to form a test subset. A large number of experiments were carried out on CNN structure and Transformer-structure models using this algorithm. The results show that the selected test inputs improve time efficiency while ensuring that the accuracy is close to that of the original test set, with the average error of accuracy compared with the overall test set ranging from 0.18% to 2.32%.

Key words: test case set reduction, deep learning, image recognition, genetic algorithm, software test- ing