• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An image spam filtering method
based on integrated learning

ZHAO Jun-sheng,HOU Sheng,WANG Xin-yu,YIN Yu-jie   

  1. (College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,China)
  • Received:2019-08-10 Revised:2020-01-10 Online:2020-06-25 Published:2020-06-25

Abstract:

Currently, majority of the image spam mail filtering technologies adopt a global common image spam mail data set as the training set. This data set lacks of updates and exhibits characteristics different from Chinese domestic image spam mails. In addition, it only employs only one type of classi- fier, which worsens the filtering performance. To address this issue, on the basis of constructing a domestic image spam mail database, the color, texture, and shape characteristics of images are extracted firstly. Then, the K-NN classification algorithm is used to select the HSV color histogram features for training, testing and performance comparison of different classifiers. A serial iterative improvement method integrating rough set-based K-NN, Naive Bayes, and SVM is proposed to form a strong integrated learning classifier, which can effectively filter domestic image spam mails. The accuracy and recall rate of image spam filtering can be improved to 97.3% and 96.1% respectively, and the false positive rate is reduced to 2.7%.
 

Key words: image spam filtering, image classification, integrated learning, K-NN algorithm, HSV color histogram