增量更新Crawler进行Web收集方法研究

J4 ›› 2006, Vol. 28 ›› Issue (12): 28-30.

增量更新Crawler进行Web收集方法研究

出版日期:2006-12-01 发布日期:2010-05-20

Online:2006-12-01 Published:2010-05-20

摘要/Abstract

摘要：

本文针对目前Web信息挖掘中存在的各种问题，对网络爬虫系统进行研究，提出了一种基于HTTP协议原理、旨在减少网络爬虫系统运行时网络流量的Web页面收集方法——增量更新Crawler方法。该方法通过Web预取技术对现有的Web链接数据库进行演化更新，可以在减少网络流量的同时获得接近现有网络爬虫系统的效果。

关键词: 信息检索网络爬虫增量更新

Abstract:

Face to the problems which exist in Web information mining the paper studies network crawler systems,and proposes a HTTP-based crawling method of in crement updating for reducing the network flow when a network crawler system runs. The method updates the current Web link database by the Web prefetch technique, and shows the effect close to the current network crawler systems when r educing the network flow.

Key words: information retrieval, web crawler, increment updating

程菲汪建海罗键. 增量更新Crawler进行Web收集方法研究[J]. J4, 2006, 28(12): 28-30.

CHENG Fei, WANG Jian-hai, LUO Jian. [J]. J4, 2006, 28(12): 28-30.

增量更新Crawler进行Web收集方法研究

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 0

编辑推荐

Metrics

本文评价