• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (02): 231-237.

• 论文 • Previous Articles     Next Articles

Survey on topic-focused crawlers 

YU Juan,LIU Qiang   

  1. (School of Economics and Management,Fuzhou University,Fuzhou 350108,China)
  • Received:2013-08-27 Revised:2013-10-18 Online:2015-02-25 Published:2015-02-25

Abstract:

With the exponential growth of network information resources and the growing personalized demands of customers, topicfocused crawler emerges as the times require. Topicfocused crawlers are programs designed to download web pages which are relevant to specific topics. Using information gathered at running time, topicfocused crawlers explore the webs which follow promissory hyperlinks, and fetch only pages which appear to be relevant. The searching engine and corpus building based on topicfocused crawling have been widely used. We first define the goals and operating principles of focused crawling, comprehensively analyze the recent advances at home and abroad, and then compare the crawling strategies of various topicfocused crawlers as well as the advantages and disadvantages of related algorithms. Finally, we point out the future direction of topicfocused crawling.

Key words: web crawler;focused-crawler;searching engine