• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1170-1177.

• 高性能计算 • 上一篇    下一篇

并行平衡级联支持向量机

刘屹成,刘晓燕,严馨   

  1. (昆明理工大学信息工程与自动化学院,云南 昆明 650500)
  • 收稿日期:2022-05-03 修回日期:2022-09-26 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11

A parallel balanced cascade support vector machine

LIU Yi-cheng,LIU Xiao-yan,YAN Xin   

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2022-05-03 Revised:2022-09-26 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要: 级联支持向量机CSVM通过对数据集进行分组,基于子数据集并行训练,极大地缩短了训练时间、减少了内存占用,但使用该方法得到的模型与直接训练得到的模型相比存在一定误差。首先,分析了分组训练产生误差的原因,并对理想情况下无误差的分组进行了归纳总结。然后,提出一种平衡级联支持向量机BCSVM算法。该算法平衡了分组后子数据集中的样本比例,确保子数据集中的样本比例与原数据集的相同,且在分组训练时可以通过调整参数值,获取更多的支持向量,降低全局支持向量丢失的概率。同时,对BCSVM算法的有效性进行了论述,阐明了使用该算法得到的模型比使用随机分组CSVM得到的模型有更高的预测精度。最后,使用多个常见数据集进行实验验证,结果表明,采用BCSVM算法进行训练所得到的准确率误差由之前的1%降低到了0.1%左右。

关键词: 并行计算, 支持向量机, 分块, 平衡子集, 参数缩放

Abstract: Cascade support vector machine (CSVM) divides the dataset into groups and trains them in parallel, greatly reducing training time and memory usage. However, the accuracy of the model obtained using this method has certain errors compared to direct training. In order to reduce the error, the reasons for the error caused by grouping training are analyzed, and the ideal grouping without error is summarized. A balanced cascade support vector machine (BCSVM) algorithm is proposed. The algorithm balances the sample proportions in the sub-datasets after grouping, ensuring that the sample proportions in the sub-datasets are the same as those in the original dataset. It adjusts the parameter values during grouping training to obtain more support vectors, thereby reducing the possibility of global support vector loss. At the same time, researchers discussed the effectiveness of BCSVM algorithm and demonstrated that models obtained using this algorithm have better performance in prediction accuracy than those obtained using random grouping CSVM. Finally, multiple common datasets are used for experimental verification, and the results show that the accuracy error obtained by training using the BCSVM algorithm is reduced from 1% to about 0.1%, i.e., by one order of magnitude.

Key words: parallel computing, support vector machine, chunking, balanced subset, parameter scaling