• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (12): 173-177.

• 论文 • 上一篇    下一篇

抗干扰词攻击的免疫垃圾邮件过滤模型

王小伟1, 郭红涛2, 王中锋3   

  1. (1.郑州大学体育学院现代教育技术中心,河南 郑州 450044;2.华北水利水电学院软件学院,河南 郑州 450011;
    3.北京市劳动保护科学研究所安全与应急管理研究室,北京 100054)
  • 收稿日期:2012-06-18 修回日期:2013-09-14 出版日期:2013-12-25 发布日期:2013-12-25
  • 基金资助:

    河南省教育厅科学技术研究重点项目(12B520056,13B520253);郑州大学体育学院青年基金项目(2011C3003)

Anti-noise word attack spam filtering model based on artificial immune system          

WANG Xiaowei1,GUO Hongtao2,WANG Zhongfeng3   

  1. (1.Modern Education Technology Center,Physical Education College of Zhengzhou University,Zhengzhou 450044;
     2.Software College,North China University of Water Resources and Electric Power,Zhengzhou 450011;
    3.Safety and Emergency Management Lab,Beijing Municipal Institute of Labour Protection,Beijing 100054,China)
  • Received:2012-06-18 Revised:2013-09-14 Online:2013-12-25 Published:2013-12-25

摘要:

针对当前基于人工免疫技术的垃圾邮件过滤算法中很少考虑干扰词攻击的情况,提出了一种抗干扰词攻击的免疫垃圾邮件过滤模型训练算法ANWAIS。该算法在基因库生成阶段,采用互信息差值作为评估函数,可以过滤掉垃圾邮件中的好词和正常邮件中的垃圾词,从而使基因库更能反映垃圾邮件的特征;同时,在抗体更新阶段,通过维护丢弃词表,可保证基因库的纯洁性。仿真实验表明,该算法能够比未考虑干扰词攻击的垃圾邮件过滤算法获得更好的抗体质量和更优的分类性能。

关键词: 人工免疫;干扰词攻击;垃圾邮件过滤;互信息差值;基因库

Abstract:

Current spam filtering algorithms based on artificial immune system consider little about the noise word attack, so an immunebased antinoise word attack spam filtering model, named ANWAIS, is proposed in order to solve the problem. The algorithm uses the Mutual Information Difference as the Evaluation function to discard the good word in the spam and the spam word in the normal email during the stage of the generation of the gene library, so that the gene library can better reflect the characteristics of spam emails. Meanwhile, it can guarantee the purity of the gene library through maintaining the discard word table during the stage of the updating of the antibody. Experimental results show that ANWAIS can obtain higher quality antibody and have better classification performance than that of other spam filtering algorithms without considering the noise word attack.

Key words: artificial immune;noise word attack;spam filter;mutual information difference;gene library