• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (1): 103-107.

• 论文 • Previous Articles     Next Articles

A Text Similarity Matrix OperationBased Classification Algorithm for Largescale Unstructured Complaint Data

LI Qing1,CHEN Yang2,XIE Haoran1,MENG Shengguang3   

  1. (1.Department of Computer Science,City University of Hong Kong, Kowloon Tong, Hong Kong SAR 999077;
    2.China Mobile Corporation Guangxi Co. Ltd., Nanning 530000;
    3.Faster Software Technology Co. Ltd., Zhuhai 519080,China)
  • Received:2010-05-20 Revised:2010-10-26 Online:2012-01-25 Published:2012-01-25

Abstract:

With the fast development of the Internet and information technology nowadays, the growth of the volume of unstructured data is exponential. In particular, the prevalence of the Web 2.0 network community further enlarges the growth tendency. Therefore, how to manage and organize largescale unstructured data effectively, so as to facilitate enduser information access, becomes an urgent and important research topic. In this paper, based on the text of unstructured data modeling and text similarity, the existing largescale unstructured data classification algorithms are surveyed and discussed, and they are applied to a China Mobile user complaint data classification system. Upon the latter, the effectiveness of processing the complaint data is shown to have been much improved, and the usage of our proposed classification algorithm and system architecture is verified.

Key words: text similarity;unstructured data;complaint data classification system