Design of a visual Deep Web crawler platform based on Hadoop

J4 ›› 2016, Vol. 38 ›› Issue (02): 217-223.

• 论文 • Previous Articles Next Articles

Design of a visual Deep Web crawler platform based on Hadoop

LIU Tong1,ZHANG Yang2,SUN Qi2,YUAN Chong2

（1.Beijing Key Laboratory of Cloud Computing Key Technology and Application,Beijing Computing Center,Beijing 100094;
2.Department of ToT and Big Data Applications,Beijing Key Laboratory of Cloud Computing Key
Technology and Application,Beijing Computing Center,Beijing 100094,China）

Received:2015-09-10 Revised:2015-11-13 Online:2016-02-25 Published:2016-02-25

Abstract

Abstract:

With the development of IT technology, internet information resources become much richer. We can obtain relevant knowledge from complicated internet information thanks to the rapid development of big data technology. The most essential part is the big data crawler technology which can crawl and save Internet data structurally. In this paper, we present and develop an efficient Deep Web information crawler based on Hadoop. This crawler employs the Webkit as the core engine which can implement the visual configuration and the deep data collection. To improve the efficiency, the data collection algorithm is also optimized by adjusting the strategy of task distribution in Hadoop. Experimental results demonstrate that the developed data collection platform can obtain better results.

Key words: data crawler;Hadoop;visualization

LIU Tong1,ZHANG Yang2,SUN Qi2,YUAN Chong2. Design of a visual Deep Web crawler platform based on Hadoop [J]. J4, 2016, 38(02): 217-223.

Design of a visual Deep Web crawler platform based on Hadoop

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments