• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (5): 875-884.

• 图形与图像 • 上一篇    下一篇

基于Floyd-Steinberg误差扩散的数字半调高效计算

廉凯成1,杨晨1,朱佳伟1,柴志雷1,2    

  1. (1.江南大学人工智能与计算机学院,江苏 无锡 214122;2.江苏省模式识别与计算智能工程实验室,江苏 无锡 214122)
  • 收稿日期:2023-12-22 修回日期:2024-05-07 出版日期:2025-05-25 发布日期:2025-05-27
  • 基金资助:
    国家自然科学基金(61972180)

Efficient digital halftone calculation based on Floyd-Steinberg error diffusion

LIAN Kaicheng1,YANG Chen1,ZHU Jiawei1,CHAI Zhilei1,2   

  1. (1.School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122;
    2.Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Wuxi 214122,China)
  • Received:2023-12-22 Revised:2024-05-07 Online:2025-05-25 Published:2025-05-27

摘要: 针对工业界采用的主流数字半调算法——Floyd-Steinberg误差扩散算法在处理日益增大的图像数据时存在的数据依赖严重、可并行性低和实时性差等问题,提出高效计算方法。首先,通过预生成像素-误差扩散值查找表避免了频繁的误差和扩散过程计算;其次,通过基于行缓冲的高效数据结构实现访存优化;再次,提出误差累加单指令多数据SIMD并行方法,使用AVX-512指令集并行累加多个像素同向误差,增强CPU中矢量寄存器的作用;最后,通过边缘误差限制的列分块方法实现多核数据并行,同时消除由于数据并行处理时边界部分数据依赖导致的误差问题。实验结果表明:本文提出的优化算法具有良好的规模伸缩性,计算性能随最佳并行核心数量线性提升;与传统的Floyd-Steinberg误差扩散算法相比,在16核Intel CoreTM i7-11700 CPU平台上处理5 120×5 120灰度图时,获得15倍性能提升,仅需23 ms即可完成处理,更好地满足大规模、超大幅面、超高分辨率和多变内容的工业高速印刷的需求。

关键词: 数字半调, Floyd-Steinberg误差扩散, 单指令多数据, 并行计算

Abstract: In response to the issues of severe data dependency, low parallelism, and poor real-time performance of the mainstream digital halftone algorithm (the Floyd-Steinberg error diffusion algorithm) adopted in industry when dealing with increasingly large image data, an efficient computation algorithm is proposed. Firstly, a pre-generated pixel-error diffusion value lookup table is utilized to avoid frequent calculation of error and diffusion process. Secondly, memory access optimization is achieved through an efficient data structure based on row buffering. Then, a single instruction, multiple data (SIMD) parallel method for error accumulation is proposed, which uses AVX-512 instruction set parallelism to accumulate errors in the same direction for multiple pixels, enhancing the role of vector registers in the CPU. Finally, a multi core data parallelism method with edge error-constrained column blocking is implemented to eliminate errors caused by data dependency in boundary parts during data parallel processing. Experimental results demonstrate that the proposed algorithm exhibits good scalability, with computational performance linearly increasing with the optimal number of parallel cores. Compared with the traditional Floyd-Steinberg error diffusion algorithm, when processing a 5 120×5 120 grayscale image on a 16-core Intel CoreTM i7-11700 CPU platform, the proposed algorithm achieves a 15-fold performance improvement, completing the task in just 23 ms. This better meets the needs of industrial high-speed printing for large-scale, super-large format, ultra-high resolution, and varied content.

Key words: digital halftone, Floyd-Steinberg error diffusion, single instruction multiple data, parallel computing