• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 94-98.

• 论文 • 上一篇    下一篇

基于改进的免疫克隆支持向量机网页分类研究

张素琪1,刘恩海2,贺〓亚2,董永峰2   

  1. (1.天津大学,天津 300072;2.河北工业大学计算机科学与软件学院,天津 300401)
  • 收稿日期:2011-05-18 修回日期:2011-09-29 出版日期:2011-12-24 发布日期:2011-12-25

Research of Web Page Classification Based on Improved Immune CloneSupport Vector Machine

ZHANG Suqi1,LIU Enhai2,HE Ya2,DONG Yongfeng2   

  1. (1.Tianjin University,Tianjin 300072;2.School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
  • Received:2011-05-18 Revised:2011-09-29 Online:2011-12-24 Published:2011-12-25

摘要:

网页分类是为了解决网络信息过载问题而延伸的一个热门研究领域,同时支持向量机以其出色的学习能力,在解决高维问题时表现出了特定的优势。本文在研究支持向量机和标准的免疫克隆优化算法的基础上,提出了一种改进的免疫克隆和支持向量机相结合的分类算法。标准算法中由于通过对抗体编码中某些位进行随机取反来实现抗体变异,造成搜索能力不强。该方法针对上述不足,将记忆单元和普通单元区分开来,对记忆单元定义自适应概率,从而加强在当前最优解邻域内的搜索能力,加快寻求全局最优解的速度。实验结果表明,该改进算法较其他算法具有更好的参数选择效果和更高的选择效率,是一种具有较高准确率和效率的网页分类方法。

关键词: 网页分类, 支持向量机, 特征提取, 参数选择, 免疫算法

Abstract:

Web page classification is an extended hot field for solving the problem of information overload ,with the excellent ability to learn, support vector machine shows a specific advantage in solving high dimensional problems. A new classification algorithm based on the combination of support vector machine and improved immune clone is proposed after the research of support vector machines and standard immune clones. As the standard algorithm achieves antibody variants through inverting  randomly some bits in antibody coding, so it is not strong in searching capability, for this deficiency, the paper distinguishes memory units and normal units, defines adaptive probability for the memory units, thereby strengthens search capability in the neighborhood of the current optimal solution, thus accelerates the speed to find the global optimal solution. A lot of experiments have shown that the improved algorithm which has a better parameters selection effect and a higher efficiency is a web page classification method with high accuracy and efficiency.

Key words: Web page classification;support vector machine;feature extraction;parameter selection;immune algorithm