• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles    

SMS scam user identification based
on SPARK and random forest

YANG Jiechao,XU Jiangchun,YUE Qiuyan,ZENG Debin,LU Wanrong   

  1. (Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2018-08-13 Revised:2018-12-24 Online:2019-06-25 Published:2019-06-25

Abstract:

SMS scams are increasing in today’s data era. In order to identify SMS scam users before they commit fraud, we propose a weighted random forest algorithm of hierarchical subspace in the SPARK parallel processing framework according to the current telecom industry demand and research status. Aiming at the problem of low performance of the random forest caused by unbalanced data categories due to the variety of SMS users, we adopt an improved hierarchical subspace method, and weigh the decision tree according to the evaluation of the classification ability of each tree. Our proposal outperforms other classification algorithms. Given the characteristics of massive data in the telecom industry, we select the distributed SPARK as the data processing platform. The parallelized platform not only improves the efficiency of the algorithm, but also reduces training time and testing time of the model. It can identify telecom SMS scam users in real time accurately, and its accuracy rate is over 90%.
 

Key words: SPARK, random forest, hierarchical subspace, weighted, SMS scam user identification