• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2005, Vol. 27 ›› Issue (11): 48-51.

• 论文 • 上一篇    下一篇

基于Web数据挖掘的高效关联规则研究

陈晓红 秦杨   

  • 出版日期:2005-11-01 发布日期:2010-06-24

  • Online:2005-11-01 Published:2010-06-24

摘要:

随着网络资源越来越丰富,Web数据挖掘逐渐成为因特网上资源有效利用的研究热点。本文提出通过对因特网上非结构化数据的XML格式进行筛选等处理,然后转化为结构化数 据存储在SQL Server数据库中。并在此基础之上利用关联规则发现以生成最小关联规则集来代替完全关联规则集,就可以有效地剪除弱关联规则,大幅度地减少候选频繁项目目集,从而提高规则发现效率。最后,在传统经典算法Apriori基础上,利用弱关联规则的向上关闭特性设计了一个相应的高效算法。

关键词: Web数据挖掘 数据仓库 关联规则 最小关联规则集

Abstract:

With the development of Internet, the research of Internet data mining becomes hot. In the paper, the XML formats of non-structural data of the Intern et are selected and stored, and then are transformed into structural data and stored in the SQL Server database. The minimum association rule set is produced by the association rules which are based on data-warehousing. It can decrease the weak association rules effectively and reduce the frequent candidate item sets in a large scale. Finally, based on the classical Aprlori algorithm, the upward closure property of weak rules is utilized to develop a c orresponding algorithm that proves to be efficient.

Key words: (Web-based data mining, data warehouse, association rule., minimum association rules set)