• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A news keyword extraction method
combining LSTM and LDA differences

NING Shan1,2,YAN Xin1,2,ZHOU Feng1,2,WANG Hong-bin1,2,ZHANG Jin-peng3   

  1.  (1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504;
    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500;
    3.Center of Information Management,Yunnan University of Finance and Economics,Kunming 650221,China)
     
  • Received:2019-03-08 Revised:2019-05-21 Online:2020-01-25 Published:2020-01-25

Abstract:

Aiming at the influence of semantic information on TextRank, and considering both the high concentration of news headline information and the characteristics of coverage and difference of keywords, a news keyword extraction method is proposed, which combines LSTM and LDA differences. Firstly, the news text is preprocessed to obtain the candidate keywords. Secondly, the topic difference influence degree of the candidate keywords is obtained through the LDA topic model. Then, the LSTM model and the word2vec model are combined to calculate the semantic relevance between the candidate keywords and the title. Finally, according to the topic difference influence degree and the semantic relevance influence degree, the candidate keyword nodes are non-uniformly transferred to obtain the final candidate keyword ranking and extract the keywords. The proposed method combines the different attributes of keywords such as semantic importance, coverage and difference. The experimental results on the Sogou news corpus show that, compared with the traditional method, the proposed method significantly improves the accuracy and recall rate.
 

Key words: keyword extraction, news headline, TextRank algorithm, word2vec model, LDA model

CLC Number: