• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (8): 98-100.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

基于扩展概念格模型的文本分类规则提取的研究

周〓顽,周才学   

  1. (九江学院信息科学与技术学院,江西 九江 332005)
  • 收稿日期:2009-05-22 修回日期:2009-09-10 出版日期:2010-07-25 发布日期:2010-07-28
  • 作者简介:周顽(1976),男,湖北黄梅人,副教授,研究方向为数据挖掘和Web技术;周才学,副教授,研究方向为网络安全。

Research on the Extracting Rules of Text Categorization Based on the Extended Concept Lattice Model

ZHOU Wan,ZHOU Caixue   

  1. (School of Information Science and Technology,Jiujiang University,Jiujiang 332005,China)
  • Received:2009-05-22 Revised:2009-09-10 Online:2010-07-25 Published:2010-07-28

摘要:

文本分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。概念格是规则提取和数据分析的有效工具,然而概念格的构造效率始终是概念格应用的一大难题。本文研究了基于扩展概念格模型的文本分类规则提取,利用粗糙集和扩展概念格模型来进行分类规则提取。该方法利用概念树,极大地除去了冗余的概念,只需要建造很少的概念就能够提取出全部的分类规则,不仅效率较高,而且同时提取的分类规则与概念格相同。本文算法在MATLAB7.0的环境中运行的实验表明,查全率比KNN算法和SVM算法稍低,但是查准率比它们都高,因此该分类规则用于文本分类时效果与KNN和SVM相当。

关键词: 文本分类, 数据挖掘, 粗糙集, 概念格, 分类规则

Abstract:

The technique of  auto  text categorization is the foundation in text mining, and text feature selection is the core of the text categorization. Concept lattice is a very effective method to extract rules and data analysis, however, its building efficiency is very low. This paper extracts the rules of the text categorization based on the extended concept lattices model, takes advantage of concept lattice in the categorization rule extracting which eliminates the useless concepts. This method can extract all rules by using a few concepts, which is efficient. This algorithm shows in the environment of running MATLAB7.0 that the recallprecision is slightly lower than KNN and SVM ,but precision ratio is higher than them. Therefore, if the classification rules are applied to text categorization, the categorization effect can be comparable with KNN and SVM.

Key words: document categorization;data mining;rough set;concept lattice;categorization rule