• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (08): 1414-1422.

• 图形与图像 • 上一篇    下一篇

基于C-AdaBoost模型的乳腺癌预测研究

李勇1,陈思萱1,贾海2 ,王霞2   

  1. (1.西北师范大学计算机科学与工程学院,甘肃 兰州 730070;2.甘肃省人民医院药剂科,甘肃 兰州 730000)

  • 收稿日期:2019-12-12 修回日期:2020-02-27 接受日期:2020-08-25 出版日期:2020-08-25 发布日期:2020-08-29
  • 基金资助:
    国家自然科学基金(71764025,61863032,61662070);甘肃省中医药管理局科研课题(GZK-2019-40);甘肃省教育科学规划课题(GS[2018]GHBBKZ021);甘肃省高等学校科学研究项目(2018A-001);西北师范大学青年教师科研能力提升计划(NWNU-LKQN-17-9)

Prediction of breast cancer based on C-AdaBoost model

LI Yong1,CHEN Si-xuan1,JIA Hai2,WANG Xia2   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;

    2.Department of Pharmacy,the People’s Hospital of Gansu Province,Lanzhou 730000,China)

  • Received:2019-12-12 Revised:2020-02-27 Accepted:2020-08-25 Online:2020-08-25 Published:2020-08-29

摘要: 机器学习和深度学习技术可用于解决医学分类预测中的许多问题,其中一些分类算法的预测精度较高,而另一些算法的精度有限。
提出了基于C-AdaBoost模型的集成学习算法,对乳腺癌疾病进行预测,发现了判断乳腺癌是否复发、乳腺癌肿瘤是否为良性的最优特征组合。通过逐步回归方法对现有特征进行二次选取,并结合C-AdaBoost模型使得预测效果更优。大量实验表明,基于C-AdaBoost模型的算法的预测准确率比SVM、Naive Bayes、RandomForest以及传统的集成学习模型等机器学习分类器的准确率最多可提高19.5%,从而可以更好地帮助医生进行临床决策。


关键词: 集成学习, 逐步回归, 特征筛选, 疾病预测

Abstract: Machine learning and deep learning techniques can be used to solve many problems in me- dical classification prediction. Among them, some have higher prediction accuracy, but the others have limited accuracy. This paper proposes an ensemble learning algorithm based on C-AdaBoost model to predict breast cancer diseases. Stepwise regression is used to re-select existing features. The C-AdaBoost model is combined to make the prediction better. A large number of experiments show that 1) the optimal combination of features, that determines whether breast cancer recurs and whether breast cancer is benign, are found, and 2) the proposed ensemble learning algorithm based on C-AdaBoost improves the prediction accuracy by at most 19.5% in comparison to the machine learning classifiers such as SVM, Naive Bayes, RandomForest and traditional ensemble learning models, which can better help doctors make clinical decisions.


Key words: ensemble learning, stepwise regression, feature selection, disease prediction