• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (02): 370-375.

• 论文 • 上一篇    下一篇

基于支持度和增比率的改进关联分类算法

王卫平,周忠眉,郑艺峰   

  1. (闽南师范大学计算机学院,福建 漳州 363000)
  • 收稿日期:2015-04-29 修回日期:2015-07-03 出版日期:2016-02-25 发布日期:2016-02-25
  • 基金资助:

    国家自然科学基金(61170129);闽南师范大学研究生课题基金(YJS201434)

An improved associative classification approach
based on support and enhancement ratio    

WANG Weiping,ZHOU Zhongmei,ZHENG Yifeng   

  1. (School of Computer,Minnan Normal University,Zhangzhou 363000,China)
  • Received:2015-04-29 Revised:2015-07-03 Online:2016-02-25 Published:2016-02-25

摘要:

:关联分类是一项重要的分类技术,目前普遍采用基于支持度和置信度的关联分类模式。但是,用支持度度量项集的分类能力过于简单,且置信度不能度量项集与类的相关性,所以利用支持度和置信度容易产生质量不好的规则。提出改进的关联分类算法—ACSER。ACSER不仅考虑项集到本类的支持度,也考虑项集到补类的支持度。首先,提取频繁增比模式作为分类候选规则集;其次,利用置信度和增比率度量规则的强度,按照其强度进行排序和剪枝;最后,选择k条最优的规则进行预测。在16个UCI数据集上的实验结果表明,改进的分类算法ACSER与传统的分类算法相比有更高的分类准确率。

关键词: 数据挖掘, 关联分类, 频繁项集, 规则强度, 分类准确率

Abstract:

Associative classification is a significant data mining technique. The schema with support and confidence is commonly employed in the stateoftheart associative classification methods. Since the classification based on support is very simple and the classification based on confidence fails to measure the correlation between itemset and class, these methods tend to generate many inferior rules. In this paper, we propose an improved associative classification approach based on support and enhancement ratio (ACSER). The ACSER considers the support of itemset both in the target class and in its complement class. Firstly, frequent enhancement ratio patterns are extracted from training data as candidate classification rules. Secondly, the ACSER ranks and prunes the extracted rules according to the rule intensity measured by confidence and enhancement ratio. Finally, the ACSER selects the best k rules to predict unknown objects. Experiments on 16 UCI datasets show that the improved approach has higher accuracy than the traditional approaches based on support and confidence.

Key words: data mining;associative classification;frequent itemset;rule intensity;classification accuracy