• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (07): 1308-1315.

Previous Articles     Next Articles

A text classification method combining word statistical characteristics and semantic information

ZHANG Li,MA Jing   

  1. (School of Economic and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
  • Received:2020-03-03 Revised:2020-07-21 Accepted:2021-07-25 Online:2021-07-25 Published:2021-08-17

Abstract: In order to better represent the text semantic information and improve the accuracy of text classification, this paper improves the feature weight calculation method and integrates the feature vector and semantic vector for text representation. Firstly, this method extracts the text features based on the text complex network. Secondly, the statistical features of network nodes are used to improve the TF-IDF weight algorithm to get the feature vector. Thirdly, LSTM is used to get the semantic vector. Finally, the feature vector is integrated with the semantic vector to make the new text representation vector information more distinguishable. In this paper, the network news data is taken as the experimental object. The experimental results show that the improved feature weight algorithm can further enrich the text information and improve the text classification performance by introducing semantic information and structural information into the feature vector and integrating the feature vector with semantic vector.


Key words: text classification, text complex network, feature weight, long short-term memory (LSTM)