• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (07): 1330-1337.

• 论文 • 上一篇    下一篇

面向高维微阵列数据的集成特征选择算法

孙刚1,2,张靖1,3   

  1. (1.合肥工业大学计算机与信息学院,安徽 合肥 230009;
    2.阜阳师范学院计算机与信息工程学院,安徽 阜阳 236037;
    3.国网安徽省电力公司信息通信分公司,安徽 合肥 230061)
  • 收稿日期:2015-05-25 修回日期:2015-09-01 出版日期:2016-07-25 发布日期:2016-07-25
  • 基金资助:

    国家自然科学基金(51174257/F030504);中央高校基本科研业务费专项资金(2013BHZX0040);安徽省级科研机构委托专项重点项目(2013WLGH01ZD)

130X.2016.07.005An ensemble feature selection algorithm
for high dimensional microarray data 

SUN Gang1,2,ZHANG Jing1,3   

  1. (1.School of Computer and Information,Hefei University of Technology,Hefei 230009;
    2.School of Computer and Information Engineering,Fuyang Teachers College,Fuyang 236037;
    3.State Grid Anhui Information and Telecommunication Company,Hefei 230061,China)
  • Received:2015-05-25 Revised:2015-09-01 Online:2016-07-25 Published:2016-07-25

摘要:

特征选择算法是微阵列数据分析的重要工具,特征选择算法的分类性能和稳定性对微阵列数据分析至关重要。为了提高特征选择算法的分类性能和稳定性,提出一种面向高维微阵列数据的集成特征选择算法来弥补单个基因子集信息量的不足,提高基因特征选择算法的分类性能和稳定性。该算法首先采用信噪比方法选择若干区分基因;然后对每个区分基因利用条件信息相关系数评估候选基因与区分基因的相关性,生成多个相关基因子集,最后,通过集成学习技术整合多个相似基因子集。实验结果表明,本文提出的集成特征选择算法的分类性能以及稳定性在多数情况下均优于只选择单个基因子集的方法。

关键词: 微阵列数据, 信噪比, 条件相关系数, 特征选择

Abstract:

Feature selection algorithms are an important tool for microarray data analysis, thus their classification ability and stability are essential for data analysis. We propose an ensemble feature selection algorithm for high dimensional microarray data to compensate for the lack of information on a single gene subset. We firstly adopt the signal noise ratio method to select discriminative genes, and then generate relevant gene subsets by evaluating the correlation between the candidate gene and discriminative gene through conditional correlation coefficients. We finally integrate resemblant gene subsets through the ensemble learning technology. Experimental results show that in most cases the classification ability and stability of the proposed algorithm is superior to those that select only a single gene subset.

Key words: microarray data;signal noise ratio;conditional correlation coefficient;feature selection