• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Automatic patent query expansion
based on word embedding

LIU Meng-lan1,2 ,LIU Bin1,2,PENG Zhi-yong1,2
 
  

  1. (1.State Key Laboratory of Software Engineering,Wuhan University,Wuhan 430072;
    2.School of Computer,Wuhan University,Wuhan 430072,China)
     
  • Received:2017-07-10 Revised:2017-09-15 Online:2017-12-25 Published:2017-12-25

Abstract:

Patent retrieval is very different from information retrieval. Patent texts include right statement, abstract and full text, so we cannot simply apply the retrieval algorithms for common texts to patent retrieval. Patent retrieval usually faces the problem of low recall rate. Firstly, due to the highly professional and complex expression and terms of patent texts, it is not easy to capture the search intent from users’ queries, eventually leading to unsatisfactory search results. Secondly, inventors consciously create some distinctive words when they write patent texts to avoid being retrieved. Many retrieval algorithms are designed to improve the recall rate, however, many problems remain to be solved and the effectiveness be improved. We propose an automatic patent query expansion model based on word embedding. On the basis of word embedding, a keyword network in patent domain is constructed, and then the dense subgraph discovery algorithm is used to find expansion terms, which can improve the effectiveness of expansion terms. Extensive experiments on the CLEF-IP 2012 dataset show that the proposed algorithm can guarantee the flexibility and effectiveness of expansion terms and improve the recall rate of patent retrieval.
 

Key words: patent retrieval, query expansion, word embedding, deep learning