基于MMD-GA的深度学习测试集优化约简

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (9): 1700-1710.

• 人工智能与数据挖掘 • 上一篇

基于MMD-GA的深度学习测试集优化约简

王凤英1，2，宋子凯2，张岩1，杜利明1

（1.宿迁学院信息工程学院，江苏宿迁 223800；2.沈阳建筑大学计算机科学与工程学院，辽宁沈阳 110168）

收稿日期:2023-11-21 修回日期:2024-05-09 出版日期:2025-09-25 发布日期:2025-09-22
基金资助:
江苏省产学研合作项目(BY20231232);宿迁学院人才引进科研启动基金（校2022XRC091）

Optimization and reduction for deep learning test set based on MMD-GA

WANG Fengying1,2，SONG Zikai2，ZHANG Yan1,DU Liming1

(1.School of Information Engineering,Suqian University,Suqian 223800;
2.School of Computer Science and Engineering,Shenyang Jianzhu University,Shenyang 110168,China）

Received:2023-11-21 Revised:2024-05-09 Online:2025-09-25 Published:2025-09-22

摘要/Abstract

摘要： 在图像识别领域，测试用例冗余且标记标签仍需人工操作，对测试用例进行优化是解决测试代价高昂、测试效率低下的有效方法。基于此，提出一种基于进化算法的测试用例优化约简方法—ERIR，使用深度神经网络模型提取图像特征，代入HDBSCAN聚类算法分析原始测试集数据分布，在聚类结果的基础上以最小化测试子集与原始分布为目标设计进化算法。提出了基于最大均值差异与遗传算法融合的测试用例挑选算法—MMD-GA，能够在每个聚类簇中挑选出最具有代表性的原型构成测试子集。应用该算法在CNN结构和Transformer结构模型上进行了大量实验，结果显示挑选出的测试输入在提升时间效率的基础上保证了准确率接近原始测试集，对比整体测试集准确率平均误差在0.18%～2.32%。

关键词: 测试用例约简, 深度学习, 图像识别, 遗传算法, 软件测试

Abstract: In the field of image recognition, test cases are redundant and labeling still requires manual operation. Optimizing test cases is an effective way to solve the problems of high testing costs and low testing efficiency. Based on this, a test case optimization and reduction method based on evolutionary algorithm, named ERIR, is proposed. It uses a deep neural network model to extract image features, which are then substituted into the HDBSCAN clustering algorithm to analyze the data distribution of the original test set. On the basis of clustering results, an evolutionary algorithm is designed with the goal of minimizing the difference between the test subset and the original distribution. A test case selection method combining maximum mean discrepancy and genetic algorithm, named MMD-GA, is proposed, which can select the most representative prototypes from each cluster to form a test subset. A large number of experiments were carried out on CNN structure and Transformer-structure models using this algorithm. The results show that the selected test inputs improve time efficiency while ensuring that the accuracy is close to that of the original test set, with the average error of accuracy compared with the overall test set ranging from 0.18% to 2.32%.

Key words: test case set reduction, deep learning, image recognition, genetic algorithm, software test- ing

王凤英1, 2, 宋子凯2, 张岩1, 杜利明1. 基于MMD-GA的深度学习测试集优化约简[J]. 计算机工程与科学, 2025, 47(9): 1700-1710.

WANG Fengying1, 2, SONG Zikai2, ZHANG Yan1, DU Liming1. Optimization and reduction for deep learning test set based on MMD-GA[J]. Computer Engineering & Science, 2025, 47(9): 1700-1710.

[1]	李志鹏1, 陈丹阳1, 2, 钟诚1, 2. 一种适合大面积破损图像的多重修复网络[J]. 计算机工程与科学, 2025, 47(9): 1638-1646.
[2]	王燕, 刘晶晶, 胡津源, 陈燕燕. 基于Transformer的逐像素细节补偿去雾网络[J]. 计算机工程与科学, 2025, 47(9): 1647-1657.
[3]	尹春勇, 张小虎. 基于Transformer和Text-CNN的日志异常检测[J]. 计算机工程与科学, 2025, 47(3): 448-458.
[4]	徐雯, 于瓅. 基于迭代收缩阈值与深度学习的压缩感知图像重构网络[J]. 计算机工程与科学, 2025, 47(3): 485-493.
[5]	刘拥民, 许成, 黄浩, 张钱垒, 赵俊杰, . 基于SAE和WGAN的入侵检测方法研究[J]. 计算机工程与科学, 2025, 47(2): 256-264.
[6]	许天佑, 高光勇. 基于可逆生成对抗网络的鲁棒图像隐藏[J]. 计算机工程与科学, 2025, 47(2): 288-297.
[7]	吴玉虹, 王建. 基于Patches-CNN的模拟电路故障诊断[J]. 计算机工程与科学, 2025, 47(1): 35-44.
[8]	徐超, 阮荣耀, 陈勇, . 一种基于区块链的医疗数据审计方法[J]. 计算机工程与科学, 2025, 47(1): 95-106.
[9]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(1): 150-159.
[10]	陈磊, 梁正友, 孙宇, 蔡俊民. 多尺度特征融合的移动端单目深度估计研究[J]. 计算机工程与科学, 2024, 46(9): 1616-1524.
[11]	安园园, 马晓宁. 改进遗传算法与多目标优化模型的航班路径规划[J]. 计算机工程与科学, 2024, 46(9): 1660-1666.
[12]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(8): 1395-1402.
[13]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(7): 1296-1310.
[14]	李成冉, 方佳豪, 尹首一, 魏少军, 胡杨. 基于遗传算法的晶圆级芯片映射算法研究[J]. 计算机工程与科学, 2024, 46(6): 993-1000.
[15]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(6): 1063-1071.

基于MMD-GA的深度学习测试集优化约简

Optimization and reduction for deep learning test set based on MMD-GA

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价