• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (3): 113-117.

• 论文 • 上一篇    下一篇

基于双字符序检测的BM模式匹配改进算法

王浩,张霖,张庆   

  1. (安徽建筑工业学院信息网络中心,安徽 合肥 230022)
  • 收稿日期:2011-10-17 修回日期:2011-12-25 出版日期:2012-03-26 发布日期:2012-03-25
  • 基金资助:

    安徽高校省级自然科学研究重点项目(KJ2009A61);安徽高校省级自然科学研究一般项目(KJ2010B041)

An Improved BM Pattern Matching Algorithm Based on Double Character Sequence Checking

WANG Hao,ZHANG Lin,ZHANG Qing   

  1. (Information and Network Center,Anhui Institute of Architecture and  Industry,Hefei 230022,China)
  • Received:2011-10-17 Revised:2011-12-25 Online:2012-03-26 Published:2012-03-25

摘要:

BM算法是一类效率较高的单模式匹配算法,通常改进的BM算法往往从提高字符首次不匹配概率和匹配窗口的最大移动距离入手,但为实现此目的所带来的高访存开销使算法实际效率受到影响。DCSBM算法以适当减小关键步长为代价,在利用双字符序检测提高首次匹配失败概率的同时,对匹配窗口移动关键步长字符距离所需的查表次数和访存次数进行优化。经测试,DCSBM算法显著提高了匹配窗口的平均移动距离。在文本或模式串相对较长情况下,该算法实际测试效率优于BM、BMHS、BMN等算法。

关键词: 模式匹配, 双字符序, BM算法, BMHS算法

Abstract:

The BM algorithm is a  more efficient single pattern matching algorithm. The common method of improving the BM algorithm is to start with increasing the probability of the first failure character matching and the maximal moving distance of the matching window. At the same time, the higher cost of accessing memory counteracts the actual efficiency of the new algorithm. Optimizing the number of the lookup table and the times of accessing memory when it moves the key distance of the matching window, and DCSBM makes good use of the double character’s first failure matching probability at the cost of reducing the key steps properly. It is tested that DCSBM obviously increases the average moving distance of the matching window. In larger length of the text or pattern, the real efficiency of DCSBM is higher than BM, BMHS, or BMN algorithm.

Key words: pattern matching;double character sequence;BM algorithm;BMHS algorithm