• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (01): 168-172.

• 论文 • 上一篇    下一篇

基于最近邻子空间搜索的两类文本分类方法

李玉鑑,王影,冷强奎   

  1. (北京工业大学计算机学院,北京 100124)
  • 收稿日期:2013-04-26 修回日期:2013-07-03 出版日期:2015-01-25 发布日期:2015-01-25
  • 基金资助:

    国家自然科学基金资助项目(61175004);北京市自然科学基金资助项目(4112009);北京市教委科技发展项目(KZ201210005007);高等学校博士学科专项科研基金资助项目(20121103110029)

Two-class text categorization using nearest subspace search 

LI Yujian,WANG Ying,LENG Qiangkui   

  1. (College of Computer Science,Beijing University of Technology,Beijing 100124,China)
  • Received:2013-04-26 Revised:2013-07-03 Online:2015-01-25 Published:2015-01-25

摘要:

在文本分类中,最近邻搜索算法具有思想简单、准确率高等优点,但通常在分类过程中的计算量较大。为克服这一不足,提出了一种基于最近邻子空间搜索的两类文本分类方法。首先提取每一类样本向量组的特征子空间,并通过映射将子空间变换为高维空间中的点,然后把最近邻子空间搜索转化为最近邻搜索完成分类过程。在Reuters21578数据集上的实验表明,该方法能够有效提高文本分类的性能,具有较高的准确率、召回率和F1值。

关键词: 文本分类, 最近邻子空间搜索, 最近邻搜索

Abstract:

The nearest neighbor search algorithm is a simple method with high accuracy in text categorization, but it usually requires large amounts of calculation in the classifying process. To overcome this disadvantage, a twoclass text categorization method is proposed based on the nearest subspace search. It extracts a feature subspace from samples in the same class, and maps it to a point in a higher dimensional space, in which the classifying process is carried out by nearest neighbor search. Experiments on Reuters-21578 data sets show that the proposed method can effectively improve the performance of nearest neighbor search in text categorization, achieving a higher precision, recall rate, and F1 values.

Key words: text categorization;nearest subspace search;nearest neighbor search