• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (4): 115-120.

• 论文 • Previous Articles     Next Articles

An Analysis of the Search Engine User Behaviors Based on Hadoop

WANG Zhenyu1,GUO Li2   

  1. (1.School of Software Engineering,South China University of Technology,Guangzhou 510006;(2.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China)
  • Received:2010-03-28 Revised:2010-07-15 Online:2011-04-25 Published:2011-04-25

Abstract:

Search engine user behaviors analysis is a focus of network information retrieval. It is a method of analyzing the user’s behaviors through clicks to mine useful information to improve search engine’s efficiency and retrieval services. In face of easy expansion and programming bottlenecks in traditional parallel computation models, a massive log data processing model based on Hadoop is given, which improves scalability and easy programming through Hadoop Distributed File System and MapReduce. Moreover, the experiment of analyzing 22 million query logs of the Sogou search engine in one month is carried out based on this model. The analysis result is instructive and meaningful to mastering the  user’s behaviors, evaluating and improving the searching and sorting algorithms.

Key words: Hadoop;distributed computing;user behavior analysis;massive data