• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

一种扩展Winnowing剽窃检测算法

段旭良,杨洋,王曼韬,穆炯   

  1. (四川农业大学信息工程学院,四川 雅安 625014)
  • 收稿日期:2015-12-16 修回日期:2016-09-29 出版日期:2017-12-25 发布日期:2017-12-25
  • 基金资助:

    四川省教育厅自然科学项目(15ZB0017)

An extended Winnowing plagiarism detection algorithm

DUAN Xu-liang,YANG Yang,WANG Man-tao,MU Jiong     

  1. (College of Information Engineering,Sichuan Agricultural University,Ya’an 625014,China)
  • Received:2015-12-16 Revised:2016-09-29 Online:2017-12-25 Published:2017-12-25

摘要:

剽窃是目前学术界和教育界面临的普遍问题,成熟的商业化剽窃检测系统运行时间和经济代价高,不适合实时性、轻量级的学生作业等日常检测。对基于文本指纹的Winnowing剽窃检测算法进行扩展,在提取指纹的同时记录文本定位及其长度信息,给出了指纹提取、文本定位、剽窃指纹索引合并等算法,实现了剽窃文本的检测、定位、标记。实验结果及算法在应用系统中实际运行状况表明,算法的扩展对其性能影响不大,普通硬件配置条件下即可满足中小规模应用需求。扩展算法在原算法轻量级、高效率、可靠性和灵活度高等特点基础上,进一步拓展了Winnowing的功能,增强了原算法的适应性和应用价值。

关键词: Winnowing, 剽窃检测, 相似检测, 剽窃文本定位, 文本指纹

Abstract:

Plagiarism is a common problem faced by both academic and education fields. Although commercial plagiarism detection systems are relatively mature in terms of technology, they are not adopted in routine, real-time and lightweight fields such as student assignments detection because of high cost in efficiency and economy. We propose an extending classic Winnowing plagiarism detection algorithm, which can record the location and length while calculating the hash value of a text block. The location and length information in fingerprints can be used to locate and mark plagiarism text block in original documents. We describe algorithms for detecting, locating and plagiarism fingerprints index merging using the extended Winnowing, and performe some functional and performance experiments to test the algorithms. Experiments and actual running results show that the extended  Winnowing affects performance slightly, but it can meet the needs of small to medium applications under general hardware configuration. The extended Winnowing algorithm keeps the original features such as high efficiency, reliability and flexibility, and meanwhile gets improved in functionality and enhances its practicability and adaptability.
 

Key words: Winnowing, plagiarism detection, similarity detection, plagiarism text positioning, text finger