J4 ›› 2015, Vol. 37 ›› Issue (02): 231-237.
• 论文 • 上一篇 下一篇
于娟,刘强
收稿日期:
修回日期:
出版日期:
发布日期:
基金资助:
国家自然科学基金资助项目(71201032);福建省社会科学规划资助项目(2012C021);福建省教育厅社会科学研究资助项目(JA11040S)
YU Juan,LIU Qiang
Received:
Revised:
Online:
Published:
摘要:
网络信息资源呈指数级增长,面对用户越来越个性化的需求,主题网络爬虫应运而生。主题网络爬虫是一种下载特定主题网页的程序。利用在采集页面过程获得的特定信息,主题网络爬虫抓取的页面都是与主题相关的。基于主题网络爬虫的搜索引擎以及基于主题网络爬虫构建领域语料库等应用已经得到广泛运用。首先介绍了主题爬虫的定义、工作原理;然后介绍了近年来国内外关于主题爬虫的研究状况,并比较了各种爬行策略及相关算法的优缺点;最后提出了主题网络爬虫未来的研究方向。关键词:
关键词: 网络爬虫, 主题爬虫, 搜索引擎
Abstract:
With the exponential growth of network information resources and the growing personalized demands of customers, topicfocused crawler emerges as the times require. Topicfocused crawlers are programs designed to download web pages which are relevant to specific topics. Using information gathered at running time, topicfocused crawlers explore the webs which follow promissory hyperlinks, and fetch only pages which appear to be relevant. The searching engine and corpus building based on topicfocused crawling have been widely used. We first define the goals and operating principles of focused crawling, comprehensively analyze the recent advances at home and abroad, and then compare the crawling strategies of various topicfocused crawlers as well as the advantages and disadvantages of related algorithms. Finally, we point out the future direction of topicfocused crawling.
Key words: web crawler;focused-crawler;searching engine
于娟,刘强. 主题网络爬虫研究综述[J]. J4, 2015, 37(02): 231-237.
YU Juan,LIU Qiang. Survey on topic-focused crawlers [J]. J4, 2015, 37(02): 231-237.
0 / / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://joces.nudt.edu.cn/CN/
http://joces.nudt.edu.cn/CN/Y2015/V37/I02/231