• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (02): 340-346.

• 论文 • 上一篇    下一篇

基于资源分配网络和语义特征选取的文本分类

何晓亮1,2,宋威1,梁久祯1   

  1. (1.江南大学物联网工程学院,江苏 无锡 214122;2.公安部交通管理科学研究所,江苏 无锡 214151)
  • 收稿日期:2012-08-13 修回日期:2012-10-08 出版日期:2014-02-25 发布日期:2014-02-25
  • 基金资助:

    国家自然科学青年基金资助项目(61103129);博士点新教师专项研究基金资助项目(20100093120004);中央高校基本科研业务费专项资金资助项目(JUSRP11130);江苏省自然科学基金资助项目(SBK201122266)

Text categorization based on resource allocating network and semantic feature selection        

HE Xiaoliang1,2,SONG Wei1,LIANG Jiuzhen1   

  1. 1.School of IoT Engineering,Jiangnan University,Wuxi 214122;
    2.Traffic Management Research Institute,Ministry of Public Security,Wuxi 214151,China)
  • Received:2012-08-13 Revised:2012-10-08 Online:2014-02-25 Published:2014-02-25

摘要:

针对资源分配网络(RAN)算法存在隐含层节点受初始学习数据影响大、收敛速度低等问题,提出一种新的RAN学习算法。通过均值算法确定初始隐含层节点,在原有的“新颖性准则”基础上增加RMS窗口,更好地判定隐含层节点是否增加。同时,采用最小均方(LMS)算法与扩展卡尔曼滤波器(EKF)算法相结合调整网络参数,提高算法学习速度。由于基于词向量空间文本模型很难处理文本的高维特性和语义复杂性,为此通过语义特征选取方法对文本输入空间进行语义特征的抽取和降维。实验结果表明,新的RAN学习算法具有学习速度快、网络结构紧凑、分类效果好的优点,而且,在语义特征选取的同时实现了降维,大幅度减少文本分类时间,有效提高了系统分类准确性。

关键词: RAN学习算法, 径向基函数, 语义特征选取, 扩展卡尔曼滤波器算法, 最小均方算法, 文本分类

Abstract:

Confronted with the existence of hidden nodes affected by the initial learning data and the low convergence rate of RAN learning algorithm, a new Resource Allocating Network (RAN) learning algorithm is proposed. The initial hidden layer node, determined through Kmeans algorithm, adding the 'RMS window’ based on the novelty rule, can better judge whether to increase hidden layer nodes or not. Meanwhile, the network parameters are adjusted by combining Least Mean Squares algorithm and Extended Kalman Filter algorithm, thus improving the learning rate. Since it is rather difficult to deal with the high dimension characteristics and complex semantic character of texts through words space text categorization method, we reduce the dimension and extract the semantic character space to the text input space through the semantic feature selection method. The experimental results show that the new RAN algorithm has the advantage of highspeed learning, compact network structure and good classification. Moreover, semantic feature selection can not only achieve the reduction of dimension and categorization time, but also raise the accuracy of the categorizing system effectively.

Key words: RAN learning algorithm;radial basis function;semantic feature selection;extended Kalman filter algorithm;least mean squares algorithm;text categorization