J4 ›› 2013, Vol. 35 ›› Issue (1): 160-165.
• 论文 • Previous Articles Next Articles
BAI Yuzhao,LIANG Jiuzhen
Received:
Revised:
Online:
Published:
Abstract:
Based on the study and research of the existing variety of focused crawlers, the paper proposes a focused crawler using probabilistic model, which analyzes various characteristics obtained in crawl process and uses probabilistic model to calculate each URL priority so as to filter and sort URLs. The proposed focused crawler based on probabilistic model solves the deficiency that most existing crawlers usually only adopt a single strategy for fetching webs from Internet. The distinct feature of our focused crawler is that: not only subject relativity but also history evaluation and web equality are considered so that the “topic drift” and “tunneling” problems are solved as well as the resource equality is guaranteed. Experimental results show that, compared with other focused crawlers, the focused crawler based on probabilistic prediction can gather more subject relevant web pages by retrieving less web pages, and has a better average topic relevant degree.
Key words: focused crawler;probabilistic model;URL filtering;URL ordering;priority value
BAI Yuzhao,LIANG Jiuzhen. Research and implementation for focused crawler based on probabilistic model[J]. J4, 2013, 35(1): 160-165.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2013/V35/I1/160