• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (04): 800-806.

• 论文 • 上一篇    下一篇

中文专利属性值对抽取技术及应用

孙东普,朱鸣华,林鸿飞   

  1. (大连理工大学计算机科学与技术学院,辽宁 大连 116024)
  • 收稿日期:2015-01-27 修回日期:2015-05-15 出版日期:2016-04-25 发布日期:2016-04-25
  • 基金资助:

    国家自然科学基金(61202254,61402075);辽宁省自然科学基金(201202031,201402003)

Chinese patent attributevalue
extraction technology and its application          

SUN Dongpu,ZHU Minghua,LIN Hongfei   

  1. (School of Computer Science and Technology,Dalian University of Technology,Dalian 116024,China)
  • Received:2015-01-27 Revised:2015-05-15 Online:2016-04-25 Published:2016-04-25

摘要:

专利信息抽取是专利分析的基础,属性及属性值的识别与抽取是专利信息抽取所要解决的关键问题。目前,在中文专利信息抽取领域针对属性和属性值同步抽取的研究较少。本文以中文专利摘要作为实验语料,运用统计学习知识,提出一种基于条件随机场的抽取方法。该方法将属性和属性值视为命名实体,利用语料训练得到条件随机场模型,从而实现对属性和属性值的抽取;再利用挖掘的关联规则完成属性与属性值匹配。实验结果的准确率、召回率和F值分别是80.8%、81.2%和81.0%,其表明该方法能够高效同步抽取属性和属性值。同时,在抽取结果的基础上,本文完成了对专利的分析和同类专利的比较,体现了本方法的实用价值。

关键词: 属性抽取, 属性值抽取, 中文专利, 条件随机场

Abstract:

Patent information extraction is the foundation of patent analysis, and its  attributes and attribute value extraction are important to patent information extraction. However, few studies focus on synchronously extracting attributes and their values in Chinese patent information extraction. Using abstracts of the Chinese patents as corpus, we propose a conditional random fields (CRFs) method based on statistic learning knowledge. Firstly,regarding the attributes and attribute values as named entities,we obtain a CRFs model by training sets, and then use this model to extract attributes and attribute values from the corpus.Secondly, we employ association rules to match the attributes and their values. Experimental results show that the accuracy, recall and Fscore can reach 80.8%, 81.2% and 81.0% respectively.The comparison of the extraction results proves the practical value of the proposal.

Key words: attribute extraction;attribute value extraction;Chinese patent;conditional random fields (CRFs)