• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于强度熵的中文关键词识别方法

闫蓉,高光来   

  1. (内蒙古大学计算机学院,内蒙古 呼和浩特010021)
  • 收稿日期:2016-07-03 修回日期:2016-09-02 出版日期:2016-11-25 发布日期:2016-11-25
  • 基金资助:

    国家自然科学基金(61263037,61662053);内蒙古自然科学基金(2014BS0604)

Chinese keywords identification based on strength entropy

YAN Rong,GAO Guanglai   

  1. (College of Computer Science,Inner Mongolia University,Hohhot 010021,China)
  • Received:2016-07-03 Revised:2016-09-02 Online:2016-11-25 Published:2016-11-25

摘要:

文本的关键词识别是文本挖掘中的基本问题之一。在研究现有基于复杂网络的关键词识别方法的基础上
,从整个复杂网络拓扑结构特征的信息缺失角度来考察各节点的重要程度。提出强度熵测度来量化评估
各节点重要程度,用于解决中文关键词识别问题。实验结果表明,该评估方法简单有效,特别适用于带
权复杂网络的节点重要性评估。

关键词: 复杂网络, 关键词抽取, 语言网络, 强度熵

Abstract:

To identify keywords of the document is one of the fundamental issues for text mining.
Focusing on the study of the existing keyword identification approaches based on complex
networks, we exploit the importance of nodes from the aspect of information missing in the
whole  complex network topology. We introduce a novel measurement, called strength entropy,
to quantitatively evaluate the importance of nodes and solve Chinese keyword identification
problem. Experimental results show that the evaluation method is simple and effective,
especially for weighted complex networks.

Key words: complex network, keyword extraction, language network, strength entropy