• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A novel big data order-preserving matching
algorithm based on similarity filtration
 

JIANG Wen-chao1,LIN De-xi1,SUN Ao-bing2,WU Xiao-qiang2   

  1. (1.School of  Computer,Guangdong University of Technology,Guangzhou 510006;
    2.Institute of Guangdong Electronics Industry,Dongguan 523808,China)

     
  • Received:2016-12-26 Revised:2017-03-21 Online:2017-07-25 Published:2017-07-25

Abstract:

Data order-preserving matching is a key problem in big data applications. Data matching can be transformed into character or number matching through abstraction or reduction. We present a novel data order-preserving matching algorithm based on similarity filtration which includes three steps: data transformation, data reduction and similarity computation. Firstly, to reflect the relation of convex growth (descent) or concave growth (descent), the data is transformed into a binary string according to the relationship among the three neighbor numbers. Secondly, to compute the similarity more accurately, the data array and pattern array are both reduced into stable interval [0,1]. Finally, according to the variety range of the relevant nodes between data array and pattern array, the similarity can be computed and sorted. Theory analysis shows that the time complex is O(n), which is lower than the algorithm presented by Cho et al. Furthermore, our algorithm can overcome the deficiencies of the algorithm presented by Cho et al. including the incontrollable min-max values and the subsection inconsistency. Based on the similarity computation, all the sub-strings can be sorted for data retrieval or searching in big data applications.

Key words: big data application, pattern matching, order-preserving matching, similarity filtration