• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    

基于语义特征空间上下文的短文本表示学习

脱婷1,马慧芳1,2,魏家辉1,刘海姣1   

  1. (1.西北师范大学计算机科学与工程学院,甘肃 兰州 730070;
    2.桂林电子科技大学广西可信软件重点实验室,广西 桂林 541004)
     
  • 收稿日期:2017-10-24 修回日期:2018-04-11 出版日期:2019-02-25 发布日期:2019-02-25
  • 基金资助:

    国家自然科学基金(61762078,61363058);广西可信软件重点实验室研究课题(kx201705);西北师范大学“学生创新能力计划”2018年支持项目(CX2018Y048)

Short text representation learning based
on semantic feature space context

TUO Ting1,MA Huifang1,2,WEI Jiahui1,LIU Haijiao1   

  1. (1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;
    2.Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 514004,China)
     
  • Received:2017-10-24 Revised:2018-04-11 Online:2019-02-25 Published:2019-02-25

摘要:

文本表示是自然语言处理中的基础任务,针对传统短文本表示高维稀疏问题,提出1种基于语义特征空间上下文的短文本表示学习方法。考虑到初始特征空间维度过高,通过计算词项间互信息与共现关系,得到初始相似度并对词项进行聚类,利用聚类中心表示降维后的语义特征空间。然后,在聚类后形成的簇上结合词项的上下文信息,设计3种相似度计算方法分别计算待表示文本中词项与特征空间中特征词的相似度,以形成文本映射矩阵对短文本进行表示学习。实验结果表明,所提出的方法能很好地反映短文本的语义信息,能对短文本进行合理而有效的表示学习。
 
 

关键词: 语义特征空间, 相似度计算, 文本映射矩阵, 短文本表示

Abstract:

Text representation is a basic task in natural language processing. Aiming at the drawback of the traditional highdimensional sparse representation of short text, we propose a short text representation learning method based on semantic feature space context, called SFCR. Given the high dimension of the initial feature space, we firstly calculate the mutual information and cooccurrence relationship between terms, based on which we obtain the initial similarity and perform semantic clustering of terms. And the semantic feature space after dimensionality reduction can then be represented via the cluster center. Secondly, by combining the context information of the terms on the cluster formed after clustering, three similarity calculation methods are designed to calculate the similarity between the terms of the short text to be represented and the feature terms in the feature space. Thereafter the text mapping matrix for short text representation learning is constructed. Experimental results show that the proposed method can well reflect the semantic information of short text, and make reasonable and effective representation learning of short text.
 

Key words: semantic feature space, similarity calculation, text mapping matrix, short text representation