• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

测试代价受限下数据的属性和粒度选择方法

廖淑娇1,2,朱清新1,梁锐1   

  1. (1.电子科技大学信息与软件工程学院,四川 成都 610054;2.闽南师范大学数学与统计学院,福建 漳州 363000)
  • 收稿日期:2017-03-20 修回日期:2017-05-26 出版日期:2018-08-25 发布日期:2018-08-25
  • 基金资助:

    国家自然科学基金面上项目(61379021);福建省自然科学基金面上项目(2017J01771);福建省教育厅A类项目(JAT160291);数字福建气象大数据研究所和数据科学与统计重点实验室资助项目

Selecting attributes and granularity
for data with test cost constraint
 

LIAO Shujiao1,2,ZHU Qingxin1,LIANG Rui 1   

  1. (1.School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054;
    2.School of Mathematics and Statistics,Minnan Normal University,Zhangzhou 363000,China)
     
  • Received:2017-03-20 Revised:2017-05-26 Online:2018-08-25 Published:2018-08-25

摘要:

代价敏感学习中经常考虑测试代价和误分类代价。在实际应用中,一个属性的测试代价常跟属性值的粒度有关,而一个具有多个属性的对象的误分类代价又常受它的属性的总测试代价大小的影响。基于这一点,研究在总测试代价受限的情形下,数据的属性和粒度选择的问题。以最小化数据处理的平均总代价为目标提出了一种方法,该方法能同时选择最优的属性子集和数据粒度。首先建立了该方法的理论模型,再设计了一个高效的算法。实验结果表明,所提算法能有效地进行不同大小的测试代价约束下的属性和粒度选择。
 
 

关键词: 代价, 误差, 邻域, 属性选择, 粒度选择

Abstract:

Test cost and misclassification cost are commonly considered in costsensitive learning. In real applications, the test cost of a feature is often related to the granularity of attribute values, and the misclassification cost of an object with multiple attributes is usually influenced by the total test cost of attributes. Based on this consideration, the paper studies the selection of attribute and granularity of data in the case where the total test cost is restrained. Aiming at minimizing the average total cost of data processing, a method is proposed to choose the optimal attribute subset and the optimal data granularity simultaneously. We first construct the theoretical model of the proposed method, and then design an efficient algorithm. Experimental results show that the proposed algorithm can effectively select the attributes and the granularity of data under different test cost constraints.
 

Key words: cost, error, neighborhood, attribute selection, granularity selection