• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于TNG特征扩展的MLFM-MN短文本分类算法

文武1,2,3,李培强1,2,郭有庆1,2   

  1. (1.重庆邮电大学通信与信息工程学院 ,重庆 400065;
    2.重庆邮电大学通信新技术应用研究中心,重庆 400065;3.重庆信科设计有限公司,重庆  401121)
  • 收稿日期:2018-12-11 修回日期:2019-04-25 出版日期:2019-11-25 发布日期:2019-11-25

An MLFM-MN short text classification
algorithm based on TNG feature extension

WEN Wu1,2,3,LI Pei-qiang1,2,GUO You-qing1,2   

  1. (1.School of Communication and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Research Center of New Communication Technology Applications,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    3.Chongqing Xinke Design Co.Ltd.,Chongqing 401121,China)
  • Received:2018-12-11 Revised:2019-04-25 Online:2019-11-25 Published:2019-11-25

摘要:

在海量短文本中由于特征稀疏、数据维度高这一问题,传统的文本分类方法在分类速度和准确率上达不到理想的效果。针对这一问题提出了一种基于Topic N-Gram(TNG)特征扩展的多级模糊最小-最大神经网络(MLFM-MN)短文本分类算法。首先通过使用改进的TNG模型构建一个特征扩展库并对特征进行扩展,该扩展库不仅可以推断单词分布,还可以推断每个主题文本的短语分布;然后根据短文本中的原始特征,计算这些文本的主题倾向,根据主题倾向,从特征扩展库中选择适当的候选词和短语,并将这些候选词和短语放入原始文本中;最后运用MLFM-MN算法对这些扩展的原始文本对象进行分类,并使用精确率、召回率和F1分数来评估分类效果。实验结果表明,本文提出的新型分类算法能够显著提高文本的分类性能。
 

关键词: 特征稀疏, TNG模型, 模糊神经网络, 扩展库, 主题倾向

Abstract:

Due to the problems of sparse features and high data dimension in short text, traditional text classification methods cannot achieve the desired classification rate and accuracy. Aiming at this problem, we propose a multi-level fuzzy minimum and maximum neural network (MLFM-MN) short text classification algorithm based on topic N-Gram (TNG) feature extension. The algorithm first constructs a feature extension library and extends the features by using the improved TNG model. The extension library can not only infer the word distribution, but also infer the phrase distribution of each topic text, and then calculate these based on the original features in the short text. Appropriate candidate words and phrases are selected from the feature extension library according to topic tendencies, and put  into the original text. Finally, the extended text objects are classified by the MLFM-MN algorithm. We use accuracy rate, recall rate and F1 score to evaluate the classification effect. The results show that the proposed algorithm can significantly improve text classification performance.
 

Key words: sparse feature, TNG model, fuzzy neural network, extension library, topic tendency