• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于深度置信网络的维吾尔文垃圾短信分类技术研究

阿丽亚·艾尔肯,哈力旦·阿布都热依木,何燕,吴冰冰   

  1. (新疆大学电气工程学院,新疆 乌鲁木齐 830047)
  • 收稿日期:2015-07-07 修回日期:2015-11-15 出版日期:2016-10-25 发布日期:2016-10-25
  • 基金资助:

    国家自然科学基金(61163026)

A Uyghur spam classification method
based on deep belief networks

Aliya·Aierken, Halidan·Abudureyimu, HE Yan, WU Bing-bing   

  1. (College of Electrical Engineering, Xinjiang University, Urumqi 830047, China)

     
  • Received:2015-07-07 Revised:2015-11-15 Online:2016-10-25 Published:2016-10-25

摘要:

针对传统分类算法对维吾尔文文本分类准确率不高的问题,提出了一种基于深度置信网络的维吾尔文短信文本分类模型。深度学习模拟人脑的多层次结构,对数据从低层到高层逐渐地进行特征提取,深层挖掘数据集的分布规律,从而提高分类准确性。通过逐层无监督的方法完成深度置信网络的初始化,并结合softmax回归分类器实现文本的分类。最后在收集的维吾尔文短信数据集上进行实验论证。实验结果表明,相比KNN、SVM和决策树算法,深度置信网络具有更好的分类效果,准确率更高。

关键词: 深度置信网络, 维吾尔文, 垃圾短信, 文本分类

Abstract:

Traditional Uygur text classification algorithms have disadvantages such as low accuracy and a long operation time. We therefore propose a Uyghur text messages classification method using the deep learning model. Deep learning simulates the multi-layered structure of the brain which gradually extracts data features from low level to high level, and deeply exploits the distribution law of data sets to improve classification accuracy. We use the layered unsupervised method to initialize the deep belief network, and combining with the softmax regression classifier, we realize the classification of Uyghur message data sets. Experiments on Uyghur messages datasets show that compared with the KNN, SVM and the decision tree algorithm, the proposed method has better classification effect.
 

Key words: deep belief networks (DBNs), Uyghur, spam, text classification