• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (12): 82-84.

• 论文 • 上一篇    下一篇

基于球向量机的中文文本分类

卢祖友 桑永胜   

  • 出版日期:2008-12-01 发布日期:2010-05-19

  • Online:2008-12-01 Published:2010-05-19

摘要:

SVM在文本分类中的应用是近年来文本分类领域重要的进展之一。许多实验表明,SVM在文本分类中比其他的机器学习算法表现出更高的分类精度,但在大规模数据上的收敛速度较慢,成为SVM在实际应用中的一大缺点。球向量机是一种比SVM更快的机器学习方法。本文将BVM应用于文本分类。实验表明,BVM在文本分类中的应用具有与SVM相当的精 度,而且比SVM有更少的训练时间。

关键词: 文本分类 支持向量机 球向量机

Abstract:

In recent years, SVM (Support Vector Machine) for text classification has been regarded as one of the important progresses in the text classification field. Many experiments show that SVM has higher classification accuracy than any other machine learning algorithms in text classification, but it has a slower rate of convergence for large-scale data, which becomes a big flaw in its practice. BVM (Ball Vector Machine) is a faster machine learning algorithm than SVM. This paper applies BVM to text categorization. Experiments on real-world text data sets demonstrate that BVM has accuracies comparable to SVM, but is much faster than SVM.

Key words: text classification, SVM, BVM