J4 ›› 2011, Vol. 33 ›› Issue (1): 157-160.doi: 10.3969/j.issn.1007130X.2011.
• 论文 • Previous Articles Next Articles
PENG Dong,CAI Wandong
Received:
Revised:
Online:
Published:
Abstract:
The Web spider is very important in gathering information, which also faces new challenges when it's been used in crawling the Web forum. This paper mainly studies the basic technologies of crawling in the Web forum, designs and implements such a system, which is mainly used to gather the information of the Web forum. According to the information structure, a traversal strategy is proposed. Based on the distribution of the context, a DOM and block algorithm is proposed. The experimental result shows that the traversal strategy is more efficient than the traditional traverses to get those highly subjectrelevant Web pages, and after using the strategy for the context extracting of Web pages, effectively improves the accuracy of the information collection.
Key words: web spider;web forum;context extracting;subject relevant
PENG Dong,CAI Wandong. The Web Forum Crawling Technology and System Implementation[J]. J4, 2011, 33(1): 157-160.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/10.3969/j.issn.1007130X.2011.
http://joces.nudt.edu.cn/EN/Y2011/V33/I1/157