A patent literature term extraction method
based on the boundary tag sets

J4 ›› 2015, Vol. 37 ›› Issue (8): 1591-1598.

• 论文 • Previous Articles Next Articles

A patent literature term extraction method
based on the boundary tag sets

DING Jie1，L Xueqiang1，LIU Kehui2

(1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,
Beijing Information Science and Technology University,Beijing 100101;
2.Beijing Research Center of Urban System Engineering,Beijing 100035,China)

Received:2014-04-21 Revised:2014-08-21 Online:2015-08-25 Published:2015-08-25

Abstract

Abstract:

Currently, most term boundary detection methods calculate the tightness between the strings by selecting an appropriate statistic magnitude and setting an appropriate threshold. However, these methods cannot obtain good results when extracting long terms. In order to solve the low recall problem of long-term extraction during the term extraction process, we propose a patent literature term extraction method based on boundary tag sets on the basis of studying a lot of patent literatures. We first propose the concept of boundary tag set and then construct boundary tag sets based on the characteristics of the boundary of terms in patent literatures. Besides, a new seedterm weighting approach is proposed to extract seed terms. Patent document terminology is compared with the Chinese Daily corpus to get terminology component library, thus improving the termhood of the candidate terms. Finally, the terms are filtered by boundary entropy so as to get a better result.Experimental results show that the proposed method has better results, with a correct rate of 81.67%, a recall rate of 71.92%, and F value of 0.765, and the results are better than the other methods mentioned in this paper.

Key words: boundary tag set;seedterm;term component library;boundary entropy

DING Jie1，L Xueqiang1，LIU Kehui2. A patent literature term extraction method
based on the boundary tag sets [J]. J4, 2015, 37(8): 1591-1598.

A patent literature term extraction method
based on the boundary tag sets

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

A patent literature term extraction method based on the boundary tag sets

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

A patent literature term extraction method
based on the boundary tag sets