• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (03): 404-410.

• 论文 • Previous Articles     Next Articles

An associated storage and retrieval system of massive
Web data based on multi-attributes                  

LUO Fang1,LI Chunhua1,ZHOU Ke1,HUANG Yongfeng2,LIAO Zhengshuang1   

  1. (1.School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074;
    2.Department of Electronic Engineering,Tsinghua University,Beijing 100084,China)
  • Received:2013-06-08 Revised:2013-10-20 Online:2014-03-25 Published:2014-03-25

Abstract:

Traditional Web Retrievals commonly use the fulltext search method which has good flexibility. However, as the analysis of public opinion usually needs relative information of web attributes and statistics, the traditional retrieval method can not satisfy it well. An associated storage and retrieval system based on the Hadoop platform is designed and implemented, which can offer good basic service for the analysis of public opinion. Firstly, the associated storage of web data based on HDFS is realized by machine learning. Secondly, the problem of small files storage together with the access efficiency of associated data is solved. Thirdly, the mapping between original web data and the extracted attributes is established. Finally, the retrieval of dynamic multiple attributes based on the existed indexing on HBase and the distributed local indexing are realized.

Key words: category storage;multiconditions selectable query;associated mapping;secondary indexing