• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

基于扩展概念格模型的文本分类规则提取的研究

展开
  • (九江学院信息科学与技术学院,江西 九江 332005)
周顽(1976),男,湖北黄梅人,副教授,研究方向为数据挖掘和Web技术;周才学,副教授,研究方向为网络安全。

收稿日期: 2009-05-22

  修回日期: 2009-09-10

  网络出版日期: 2010-07-28

Research on the Extracting Rules of Text Categorization Based on the Extended Concept Lattice Model

Expand
  • (School of Information Science and Technology,Jiujiang University,Jiujiang 332005,China)

Received date: 2009-05-22

  Revised date: 2009-09-10

  Online published: 2010-07-28

摘要

文本分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展。概念格是规则提取和数据分析的有效工具,然而概念格的构造效率始终是概念格应用的一大难题。本文研究了基于扩展概念格模型的文本分类规则提取,利用粗糙集和扩展概念格模型来进行分类规则提取。该方法利用概念树,极大地除去了冗余的概念,只需要建造很少的概念就能够提取出全部的分类规则,不仅效率较高,而且同时提取的分类规则与概念格相同。本文算法在MATLAB7.0的环境中运行的实验表明,查全率比KNN算法和SVM算法稍低,但是查准率比它们都高,因此该分类规则用于文本分类时效果与KNN和SVM相当。

本文引用格式

周〓顽,周才学 . 基于扩展概念格模型的文本分类规则提取的研究[J]. 计算机工程与科学, 2010 , 32(8) : 98 -100 . DOI: 10.3969/j.issn.1007130X.2010.

Abstract

The technique of  auto  text categorization is the foundation in text mining, and text feature selection is the core of the text categorization. Concept lattice is a very effective method to extract rules and data analysis, however, its building efficiency is very low. This paper extracts the rules of the text categorization based on the extended concept lattices model, takes advantage of concept lattice in the categorization rule extracting which eliminates the useless concepts. This method can extract all rules by using a few concepts, which is efficient. This algorithm shows in the environment of running MATLAB7.0 that the recallprecision is slightly lower than KNN and SVM ,but precision ratio is higher than them. Therefore, if the classification rules are applied to text categorization, the categorization effect can be comparable with KNN and SVM.

文章导航

/