• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (06): 1013-1021.

• 计算机网络与信息安全 • 上一篇    下一篇

基于隐形后门水印的开源数据集版权保护

黄智慧,肖祥立,张玉书,薛明富   

  1. (南京航空航天大学计算机科学与技术学院,江苏 南京 211106)
  • 收稿日期:2023-10-26 修回日期:2023-12-01 接受日期:2024-06-25 出版日期:2024-06-25 发布日期:2024-06-17
  • 基金资助:
    国家自然科学基金(62072237);江苏省研究生科研与实践创新计划(KYCX24_0610)

Copyright protection of open-sourced datasets based on invisible backdoor watermarking

HUANG Zhi-hui,XIAO Xiang-li,ZHANG Yu-shu,XUE Ming-fu   

  1. (College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing  211106,China)
  • Received:2023-10-26 Revised:2023-12-01 Accepted:2024-06-25 Online:2024-06-25 Published:2024-06-17

摘要: 针对图像分类领域开源数据集的版权保护问题,提出了一种基于后门水印的可溯源方法IBWOD,其能够保证水印在具有较强隐蔽性的同时保持良好的可用性和有效性。首先,利用一个编码器-解码器网络将后门水印嵌入到所选取的部分样本中,生成水印样本。接着,修改这些水印样本的标签为指定标签,然后将水印样本与未修改的样本合并为水印数据集。使用该水印数据集训练的模型会留下特定后门,即从后门水印到指定标签的一种映射关系。最后,提出了一种相应的模型验证算法,基于这种特殊的映射关系来验证一个可疑模型是否使用了水印数据集。实验结果表明,IBWOD能够很好地验证模型是否使用了水印数据集,并具有较强的隐蔽性。

关键词: 开源数据集, 版权保护, 后门水印, 机器学习;图像分类

Abstract: To address the copyright protection issue in the field of image classification datasets, a traceable method based on invisible backdoor watermarking, named IBWOD, is proposed. This method ensures the watermark’s strong concealment while maintaining good usability and effectiveness. Firstly, an encoder-decoder network is used to embed the backdoor watermark into selected samples, generating watermark samples. Secondly, the labels of these watermark samples are modified to specified labels, and then the watermark samples are merged with unmodified samples to form a watermark dataset. Models trained using this watermark dataset will leave a specific backdoor, i.e., a mapping relationship from the backdoor watermark to the specified labels. Finally, a corresponding model verification algorithm is proposed, based on this special mapping relationship, to verify if a suspicious model has used the watermark dataset. Experimental results demonstrate that IBWOD can effectively verify whether a model has used the watermark dataset and possesses strong concealment. 

Key words: open-sourced dataset, copyright protection, backdoor watermarking, machine learning, image classification