• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (9): 180-183.

• 论文 • 上一篇    下一篇

即时定向新闻采集技术研究

王辛1,黄穗1,龙舜1,2   

  1. (1.暨南大学计算机科学系,广东 广州 510632;
    2.广东省公共网络安全风险评价与预警应急技术研究中心,广东 广州 510632)
  • 收稿日期:2012-04-13 修回日期:2012-06-25 出版日期:2012-09-25 发布日期:2012-09-25

An Efficient Approach to JustInTime Focused News Acquisition

WANG Xin1,HUANG Sui1,LONG Shun1,2   

  1. (1.Department of Computer Science,Jinan University,Guangzhou 510632;
    2.Emergency Technology Research Center of Risk Evaluation and Prewarning on
    Public Network Security,Guangzhou 510632,China)
  • Received:2012-04-13 Revised:2012-06-25 Online:2012-09-25 Published:2012-09-25

摘要:

互联网的迅速发展带动了信息量的爆炸性增加。如何更快地采集所需信息一直是国内外研究和开发的热点。近年来,不断增长的对特定信息(例如特定领域的新闻)的需求要求有针对性地从指定的网站即时采集相关信息。这些新闻一般具有不可预见性、更新频率较快、时效性强等特点。这要求我们必须能针对这些特点实现即时定向的采集。本文提出了一种有效抓取网页并进行分析的方法,实践表明取得了满意的效果。

关键词: 新闻采集, 爬虫, 即时

Abstract:

The rapid development of the Internet leads to the explosive increase in the amount of information.How to collect the required information quickly has been a hot topic in both industry and research areas.In recent years, the growing demand for specific information (such as news of specific topics) information should be acquired from some specified sites in a justintime manner.However,they are generally unpredictable,of quicker update frequency,more timesensitive,and therefore more difficult to acquire justintime.This paper proposes a novel approach to tackle this problem,whose efficiency has been demonstrated in practice.

Key words: news acquisition;crawler;justintime