• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (11): 68-75.

• 论文 • 上一篇    下一篇

非合约对地观测数据源的动态汇聚

黄克颖1,3,高玥2,李国庆1   

  1. (1.中国科学院遥感与数字地球研究所,北京 100094;2.中国科学院计算机网络信息中心,北京 100190;
    3.中国科学院大学,北京 100190)
  • 收稿日期:2013-08-10 修回日期:2013-10-18 出版日期:2013-11-25 发布日期:2013-11-25
  • 基金资助:

    国家863计划资助项目(2012AA12A301)

Dynamic aggregation of noncontractual
earth observation datasources   

HUANG Keying1,3,GAO Yue2,LI Guoqing1   

  1. (1.Institute of Remote Sensing and Digital Earth,Chinese Academy of Sciences,Beijing 100094;
    2.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190;
    3.University of Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2013-08-10 Revised:2013-10-18 Online:2013-11-25 Published:2013-11-25

摘要:

互联网上存在大量的免费、公开、有价值的非合约形式的对地观测数据源,这些数据源具有网页查询

入口、海量数据隐藏在后台的大型数据库且数据共享平台多样、不同种类空间数据平台难以互联等特点,难

以利用传统技术实现数据汇聚和共享。在阐述目前遇到的问题后,提出了一种基于暗网爬虫架构的非合约异

构分布式数据源被动汇聚架构;设计出一套数据源识别标准、非合约式数据源发现机制、非合约式数据源搜

索条件树构建模式、非合约式数据源索引机制以及数据源异步更新规则,成功汇聚了分布在国际上不同网络

域的五个大型对地观测数据源,包括NASA、USGS、ASAR等三个国际上使用较为广泛的运行性数据源;形成了

对地观测数据资源自动化汇聚和更新工具集,最终使用户可以通过统一查询界面获取非合约对地观测数据资

源信息。

关键词: 对地观测数据搜索, 非合约式数据源, 暗网爬虫, 增量爬虫

Abstract:

It is difficult to use the traditional technology to realize data aggregation and data sharing for the Internet,

which contains a large number of free, open and valuable noncontractual earth observation

data sources. These data sources have the characteristics of webpage query entrance, massive

data hidden in the network background database, data sharing platform diversity and different

kinds of spatial data platform to interconnect etc. Considering these problems, a non

contractual heterogeneous distributed data sources passive aggregation architecture is

proposed, which is based on deep web crawler technology. Meanwhile, we design a data source

identification standard, noncontractual data source discovery mechanism, noncontractual

data source search tree building mode, noncontractual data source indexing mechanism and

data source asynchronous update rules. Using this mechanism, we archive 5 data sources of

large data sharing system including NASA, USGS, ASAR, these three widely used data resources

and form earth observation data resource automatic aggregation and update tool sets.

Eventually, through a unified query interface, users can obtain noncontractual earth

observation data resource information.

Key words: earth observation data;search noncontractual data sources;deep web crawler;incremental crawler