• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (5): 166-172.

• 论文 • Previous Articles     Next Articles

Research and design of search engine
for digital works based on Lucene         

WU Jieming,HAN Yunhui,JI Dandan   

  1. (Information Engineering Institute,North China University of Technology,Beijing 100144,China)
  • Received:2012-08-24 Revised:2012-11-02 Online:2013-05-25 Published:2013-05-25

Abstract:

On the basis of the Lucene’s fulltext retrieval toolkit, the current main Chinese word segmentation algorithm and the Lucene relevance sorting algorithm was analyzed, and an improved segmentation algorithm and an improved relevance sorting algorithm were proposed. The paper also used the inverted index, search technologies, distributed storage and parallel computing to analyze and design a search engine for the massive digital works, thus providing users with fast and accurate search service of massive digital works. The experiments compared the segmentation speed, segmentation results and the response time of the keyword search results, the hit number, accuracy and recall rate. The experiment results show that this system does improve the search speed and ensure the accuracy of search results.

Key words: Lucene;segmentation algorithm;index;relevance sorting algorithm;distributed