• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (02): 252-256.

• 论文 • 上一篇    下一篇

基于Lucene全文检索系统的设计与实现

周敬才1,胡华平1,2,岳虹1   

  1. (1.61070部队,福建 福州 350003;2.国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2013-06-24 修回日期:2013-09-29 出版日期:2015-02-25 发布日期:2015-02-25
  • 基金资助:

    国家863计划资助项目(2012AA7116048)

Design and implementation of
Lucene-based full-text retrieval system  

ZHOU Jingcai1,HU Huaping1,2,YUE Hong1   

  1. (1.Troop 61070,Fuzhou 350003;
    2.College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2013-06-24 Revised:2013-09-29 Online:2015-02-25 Published:2015-02-25

摘要:

随着信息化水平不断提高,如何从海量信息中快速查找到所需内容成为当前研究的热点。在分析了全文检索基本原理及Lucene系统结构的基础上,提出了MVC模式的全文检索模型,并实现了一套基于SSH框架技术和Lucene搜索引擎的全文检索系统。该系统扩展了检索文档支持的类型,不仅可以对TXT、MS Office各类文档进行检索,还能对PDF、HTML、RTF等文档进行检索;改进了中文分词器,提高了中文分词效率与精确度;改善了人机交互方式,实现了类似百度、谷歌搜索显示功能,对搜索关键字进行高亮显示。系统应用情况表明,该系统创建索引效率高,具有较快的检索速度以及较全的检索结果。

关键词: Lucene, 文档解析, 全文检索, 搜索引擎

Abstract:

With the continuous improvement of informationization, a highperformance, full-featured text search system, which can fast locate the matching records among massive data, has become a new research hotspot. Based on the analysis of the fundamentals of the fulltext retrieval techniques and the structure of Lucene system, we present a MVCpattern fulltext retrieval model and develop a retrieval system based on SSH framework and Lucene search engine. It has three contributions. Firstly this system optimizes the supported file formats, and adds PDF, HTML, and RTF along with TXT, Ms office documents into the search library. Secondly, it improves the Chinese words segmentation machine in efficiency and accuracy. Thirdly, it enhances humanmachine interaction and achieves a similar display function as Baidu and Google, which can highlight the search keywords. The practical application of this system demonstrates that it is efficient in creating indexes and can speed up search with much more relevant results.

Key words: Lucene;document parse;fulll-text retrieval;search engine