• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (03): 560-570.

• 人工智能与数据挖掘 • 上一篇    

基于蒙古语新闻领域本体的分布式检索方法

赵俊生1,王鑫宇2,尹玉洁1,张林1


  

  1. (1.内蒙古工业大学信息工程学院,内蒙古 呼和浩特 010080;2.特警学院基础部,北京 100875)
  • 收稿日期:2020-03-17 修回日期:2020-05-19 接受日期:2021-03-25 出版日期:2021-03-25 发布日期:2021-03-29
  • 基金资助:
    国家自然科学基金(61966027,61363052);内蒙古自治区自然科学基金(2015MS0614);内蒙古工业大学自然科学重点基金(ZD201416)

A distributed retrieval method based on Mongolian news domain ontology

ZHAO Jun-sheng1,WANG Xin-yu2,YIN Yu-jie1,ZHANG Lin1   

  1. (1.College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080;

    2.Basic Department,Special Police of China,Beijing 100875,China)
  • Received:2020-03-17 Revised:2020-05-19 Accepted:2021-03-25 Online:2021-03-25 Published:2021-03-29

摘要: 目前蒙古语语义Web方面的研究成果都是基于单机环境的,当语义Web信息检索系统投入实际运行时,单机环境存在存储容量有限和多用户并发查询速度慢等问题。针对此问题,提出了基于蒙古语新闻领域本体的分布式语义Web检索方法。首先依据蒙古语新闻领域的特点,参照七步法和骨架法,构建蒙古语新闻领域本体,研究适合本体的混合语义相似度算法进行语义扩展。然后将本体数据与算法部署于Hadoop分布式平台,解决了大规模本体数据存储的逻辑描述、物理结构和并行处理问题,实现了基于蒙古语新闻领域本体的分布式检索系统。实验结果表明,该方法有效地减少了查询关键词的响应时间,提高了新闻检索的查全率和查准率。


关键词: 蒙古语语义Web, 新闻领域本体, Hadoop, 检索性能评价, 查询响应时间

Abstract: The current research results on the Mongolian semantic Web are all based on a stand-alone environment. When the semantic Web information retrieval system is put into actual operation, the stand-alone environment has problems such as limited storage capacity and slow multi-user concurrent query speed. To solve these problems, a distributed semantic Web retrieval method based on Mongolian news domain ontology is proposed. Based on the characteristics of the Mongolian news domain, the method firstly constructs the Mongolian news domain ontology with reference to the seven-step method and the skeleton method, and studies the hybrid semantic similarity algorithm suitable for the ontology for semantic expansion. Then, the ontology data and algorithms are deployed on the Hadoop distributed platform, which solves the problems of logical description, physical structure and parallel processing of large-scale ontology data storage, and realizes a distributed retrieval system based on the Mongolian news domain ontology. The experimental results show that the response time of query keywords is effectively reduced, and the recall rate and the precision rate of news retrieval are improved.


Key words: Mongolian semantic Web, news domain ontology, Hadoop, retrieval performance evaluation, query response time