• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (12): 134-136.

• 论文 • 上一篇    下一篇

基于关联规则挖掘的中文网页体裁模式发现

吴楚坤 吴扬扬   

  • 出版日期:2008-12-01 发布日期:2010-05-19

  • Online:2008-12-01 Published:2010-05-19

摘要:

本文探讨基于关联规则挖掘的中文网页体裁模式发现问题。通过链表结构,将文档集转换为适用于关联规则挖掘的事务数据库,保证了事务数据库出现的词条项按照在文本中出现的顺序排列,实现了Apriori关联规则算法。实验结果表明,这对于某些类别的体裁模式发现有比较好的效果。

关键词: 文本分类 体裁模式 关联规则

Abstract:

This paper gives a research on pattern discovery of Chinese web page genre based on association rules. Using a linked list structure, the set of documents will be converted to a transaction database which is applied to mining association rules, and ensure the word items of the transaction database are arranged by the order of the text. An apriori association rules mining algorithm is implemented. The results of experiment show that it is more efficient for some genre pattern discovery.

Key words: text classification;genre pattern;association rule