• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2014, Vol. 36 ›› Issue (08): 1609-1614.

• • 上一篇    下一篇

时态信息的语义抽取与排序方法研究及系统实现

舒忠梅1,左亚尧2,张祖传2   

  1. (1.中山大学教育学院,广东 广州 510275;2.广东工业大学计算机学院,广东 广州 510006)
  • 收稿日期:2013-05-28 修回日期:2013-09-29 出版日期:2014-08-25 发布日期:2014-08-25
  • 基金资助:

    国家自然科学基金资助项目(60970044);广东省自然科学基金资助项目(S2011040004281)

Study on extraction and ranking of temporal #br# semantics and system implementation          

SHU Zhongmei1,ZUO Yayao2,ZHANG Zuchuan2   

  1. (1.School of Education,Sun YatSen University,Guangzhou 510275;
    2.Faculty of Computer,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2013-05-28 Revised:2013-09-29 Online:2014-08-25 Published:2014-08-25

摘要:

针对通用搜索引擎缺乏对网页内容的时态表达式的准确抽取及语义查询支持,提出时态语义相关度算法(TSRR)。在通用搜索引擎基础上添加了时态信息抽取和时态信息排序功能,通过引入时态正则表达式规则,抽取查询关键词和网页文档中的时态点或时态区间等时态表达式,综合计算网页内容的文本相关度和时态语义相关度,从而得到网页的最终排序评分。实验表明,应用TSRR算法可以准确而有效地匹配与时态表达式相关的关键词查询。

关键词: 时态语义, 信息抽取, 排序, 搜索引擎

Abstract:

General search engine lacks of extraction and retrieval of temporal semantic from the text content of the Web pages. To address the temporal query problem, the Temporal Semantic Relevancy Ranking (TSRR) algorithm is proposed by integrating the temporal information extraction and ranking functions. Firstly, the rule of the temporal regular expression is introduced to extract the temporal points or temporal intervals from the query keywords and the text content of the web pages. Secondly, the scores of web pages are reevaluated and the returned results are ranked according to the text relevancy and the temporal semantic relevancy. Experiments show that the TSRR algorithm precisely and effectively matches the keywords queries related to the temporal expression.

Key words: temporal semantic, information extraction, ranking, search engine