• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (10): 1789-1796.

• 计算机网络与信息安全 • 上一篇    下一篇

基于生成对抗网络与多头注意力的文本隐写术

黄瑶,潘丽丽,熊思宇,蒋湘辉,马俊勇   

  1. (中南林业科技大学计算机与信息工程学院,湖南 长沙 410004)
  • 收稿日期:2022-12-24 修回日期:2023-04-18 接受日期:2023-10-25 出版日期:2023-10-25 发布日期:2023-10-17
  • 基金资助:
    湖南省自然科学基金(2021JJ31164);湖南省教育厅重点项目(22A0195)

Text steganography based on generative adversarial networks and multi-head attention

HUANG Yao,PAN Li-li,XIONG Si-yu,JIANG Xiang-hui,MA Jun-yong   

  1. (School of Computer and Information Engineering,
    Central South University of Forestry and Technology,Changsha 410004,China)
  • Received:2022-12-24 Revised:2023-04-18 Accepted:2023-10-25 Online:2023-10-25 Published:2023-10-17

摘要: 随着深度学习的发展,基于文本生成的隐写术取得了重大突破。现有基于文本生成的方法存在暴露偏差的问题,即训练阶段每个输入都来自真实样本标签,预测阶段的输入来自上一时刻预测的输出。训练和预测之间的输入样本差异会产生误差积累,使得生成样本与真实样本分布相差过大。针对这个问题,提出了一种基于生成对抗网络和多头注意力的文本隐写术—TS-GANMA。首先,利用生成对抗网络训练文本生成器,通过多头注意力机制提取多头注意力得分参与奖惩模块的奖励计算,得到更适合生成器的反馈信息。随后,生成器与鉴别器进行对抗训练,能够解决暴露偏差的问题,优化文本生成模型。最后,对文本生成模型输出的条件概率分布进行编码,实现秘密信息嵌入。实验结果表明,在相同的嵌入率时,TS-GANMA隐写术与LSTM-vlc和ADG相比,隐写文本的困惑度有显著的降低,这是因为采用TS-GANMA进行文本隐写,生成的隐写文本与真实文本的统计分布更加拟合,生成的隐写文本质量更高。

关键词: 文本隐写, 暴露偏差, 生成对抗网络, 多头注意力

Abstract: With the development of deep learning, steganography based on text generation has made significant break-throughs. Existing text-based steganography methods suffer from exposure bias, where the input during training comes from real sample labels, while the input during prediction comes from the output predicted in the previous time step. This difference in input samples between training and prediction leads to error accumulation, resulting in a large distribution difference between generated samples and real samples. To address this problem, this paper proposes a text steganography model called TS-GANMA based on generative adversarial networks and multi-head attention. First, a text generator is trained using a generative adversarial network, and multi-head attention mechanisms are used to extract multi-head attention scores to participate in the reward calculation of the reward module, obtaining feedback information more suitable for the generator. Then, the generator and discriminator are trained in an adversarial manner, which can solve the exposure bias problem and optimize the text generation model. Finally, the conditional probability distribution output by the text generation model is encoded to embed secret information.  The experimental results  demonstrate that the steganography method based on TS-GANMA has a much lower perplexity than the methods based on LSTM-vlc and ADG at the same embedding rate. This is because the steganographic text generated by the TS-GANMA model fits the statistical distribution of the real text better, and can generate higher quality steganographic text.

Key words: text steganography, exposure bias, generative adversarial network, multi-head attention