• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
  • Current issue
  • Accepted
  • Archive
  • Download ranking
  • Reading ranking
  • High Performance Computing
    Research advances in acceleration methods for particle transport non-deterministic simulation
    ZHANG Jianmin, XU Weikang, LIU Jinjin, LI Tiejun
    2025, 47(01): 1-9. doi:
    Abstract ( 88 )   PDF (708KB) ( 154 )     
    Particle transport non-deterministic simulation is one of the main applications of high performance computers, and plays an important role in national economic construction and national security. Currently the nuclear numerical simulation, nuclear reactor design, aerospace and other practical applications have impending needs for high-precision particle transport non-deterministic simulation, thus its acceleration methods have become a research hotspot in the high-performance computing field. In recent years there are many different contributions to research on particle transport simulation. In this paper, the principle of particle transport non-deterministic simulation is firstly introduced. Then the basic flow and pseudocode of the particle transport simulation program is given. Subsequently, the latest software acceleration methods and the hardware acceleration methods are summarized. Finally, based on the extracted program features from the architecture simulator, the current challenges are discussed, and the future research directions based on relate research work are outlined.

    Structure optimization of second-level Cache in DSP processor
    AN Xinchen
    2025, 47(01): 10-17. doi:
    Abstract ( 61 )   PDF (1263KB) ( 96 )     
    In recent years, emerging applications in fields such as autonomous driving, medical instruments, and smart homes have placed higher demands on the real-time performance and data throughput capabilities of DSP processors. The use of multi-level cache structures in DSPs introduces latency uncertainties due to processes such as cache misses and coherency maintenance. Aiming at allevi- ating the performance degradation caused by long delay access, the method of combining miss status holding registers and victim buffer into one structure is proposed. This structure allocates its item function flexibly at runtime to improve buffer utilization. Aiming at the low synchronization efficiency of coherency maintenance information between L1  Cache and L2  Cache, this paper proposes to use the continuity between invalid addresses to synchronize invalid information to the snoop filter without blocking. The test results show that the performance of the producer-consumer scenario program with many dirty data updates is improved by 19.91%, and the synchronization time of 32 lines of invalid information decreased from 61 cycles to 16 cycles.

    Optimization of exponential and logarithm functions for vector units
    SHEN Jie, LONG Biao, HUANG Chun, TANG Tao, PENG Lin
    2025, 47(01): 18-26. doi:
    Abstract ( 52 )   PDF (823KB) ( 109 )     
    Exponential and logarithmic functions are important transcendental functions in floating-point computation, widely used in various application fields. Modern processors exhibit a trend of increasing vector register width with each generation. To further enhance the utilization of vector units by upper-layer applications, researching optimization methods for vector exponential and logarithmic functions holds significant scientific value and practical importance. Addressing the performance bottlenecks of existing vector function implementations, this paper has  designed and implemented optimization methods for exponential and logarithmic functions tailored for vector units. These methods include vector lookup table optimization based on hardware acceleration instructions, branch optimization, and precision-performance trade-off optimization. Experiments on simulators demonstrate that the optimized vector exponential and logarithmic functions meet industry-standard high precision and outperform the current best open-source implementations, achieving a speedup ratio of over 1.44. Real-world application tests further show that applications can achieve efficient vectorization with the support of the optimized vector functions, resulting in an average performance improvement of 2.53 times compared to the original scalar implementations.

    Research and design of clock recovery circuit for Duobinary signal
    YUAN Liangyong, QI Xingyun, L Fangxu, LUO Zhang, HUANG Heng, ZHANG Geng, WANG Wenchen, LI Meng, LAI Mingche
    2025, 47(01): 27-34. doi:
    Abstract ( 39 )   PDF (1862KB) ( 75 )     
    High-speed serial interfaces serve as the interconnect core between chips in high performance computer systems. Addressing the high bandwidth requirements for high-speed serial communication, the design and simulation of a 56 Gbps Duobinary signal clock and data recovery (CDR) circuit were completed based on Verilog-AMS on the Candence platform. Multi-level transmission can reduce the demand for bandwidth. The CDR circuit was designed using a phase interpolator (PI), with the phase detection results from a Bang-Bang phase detector serving as the basis for phase discrimination. Digital signal processing (DSP) algorithms, including a voting algorithm, filtering algorithm, and phase control code conversion algorithm, were employed to process the phase detection results. The digital algorithms reduced the complexity of the circuit design, facilitated the adjustment of loop gain, improved system stability, and decreased loop delay. Simulation results demonstrate that the CDR circuit can track phase differences and frequency offsets of 100 PPM. By adding a 0.25 UI sinusoidal jitter to the input data, with a loop bandwidth of 23 MHz, the system can track the sinusoidal jitter when the jitter frequency does not exceed the loop bandwidth. The jitter tolerance meets the specifications of the CEI-56G protocol.

    Fault diagnosis of analog circuits based on Patches-CNN
    WU Yuhong, WANG Jian
    2025, 47(01): 35-44. doi:
    Abstract ( 35 )   PDF (1536KB) ( 74 )     
    Deep learning is widely used in fault diagnosis, but currently, deep learning-based fault diagnosis models for analog circuits are relatively complex and difficult to deploy on edge devices. To address this issue and further improve fault diagnosis accuracy, a simple and lightweight deep learning model for analog circuit fault diagnosis, named Patches-CNN, is proposed. Firstly, the input image is divided into patches and transformed into word vectors (tokens) through a Patch Embedding operator, serving as the input for a ViT-style homogeneous structure. Feature extraction and information acquisition among tokens are carried out using the lightweight operator GSConv, which can effectively enhance the fault diagnosis accuracy of the model. Secondly, layer normalization is added to prevent gradient explosion and accelerate model convergence. To increase the nonlinearity of the model, the GELU activation function is employed. Finally, the Sallen-Key band-pass filter circuit and the Four-Opamp biquad high-pass filter circuit are used as experimental subjects. Experimental results demonstrate that this model can achieve accurate fault classification and location.


    OpenOCD debugging optimization for isomorphic asymmetric multi-core architecture
    TANG Zhu, CHEN Baohai, WANG Jingyu, ZHU Qi
    2025, 47(01): 45-55. doi:
    Abstract ( 48 )   PDF (2097KB) ( 90 )     
    Multi-core architecture is a crucial means of enhancing processor performance, and its application in network processing is becoming increasingly widespread. Consequently, efficient multi-core debugging tools are also required to improve the development efficiency of multi-core network processors. Since there is not a strong correlation among multi-core processing network tasks, the service cores more often adopt the RTC processing mode rather than the Pipeline mode, and the complex cache coherence logic is not necessary among multiple service cores.  Therefore, by adopting a homogeneous AMP architecture, chip complexity and research and development costs can be effectively reduced. Currently, asymmetric multi-core debugging requires the simultaneous launching of multiple GDBs, which not only complicates the debugging process but also results in significant resource consumption. This paper optimizes the OpenOCD multi-port debugging solution for homogeneous asymmetric multi-core scenarios, enabling debugging of multiple asymmetric cores through a single GDB port while also supporting mixed scenarios of symmetric multi-processing core clusters and asymmetric multi-core. Finally, an asymmetric multi-core debugging environment is built based on the RISC-V hardware and software platform, and GDB debugging commands, such as thread operations, step execution, step over, continue running, stack viewing, and breakpoint setting are tested, verifying the feasibility and effectiveness of single-port OpenOCD for asymmetric multi-core debugging.

    A cloud computing virtual machine scheduling strategy based on fuzzy reinforcement learning
    YU Shirui, JIANG Chunmao
    2025, 47(01): 56-65. doi:
    Abstract ( 55 )   PDF (1058KB) ( 89 )     
    Addressing the issue of high energy consumption resulting from inefficient resource management in cloud computing data centers, a fuzzy-based Q-learning(λ) reinforcement learning algorithm is proposed to tackle the high energy expenditure by addressing the virtual machine placement (VMP) problem. This algorithm takes the number of virtual machines in the current state and the utilization rate of physical hosts as input states, which are then fed into a fuzzy controller and combined with a reinforcement learning (RL) algorithm to execute corresponding strategies. This algorithm dynamically allocates relevant virtual machines to their corresponding physical servers, reducing the number of virtual machine migrations, optimizing resource utilization, and lowering energy consumption while satisfying user service level agreements (SLAs). This algorithm  can handle fluctuating workload situations and provide appropriate VM deployment (initial or remap) while meeting the expected quality of service (QoS) requirements of SLAs. Experimental results show that compared to Q-learning, Q-learning(λ), Greedy and PSO placement algorithms, the fuzzy-based Q-learning(λ) algorithm significantly reduces energy consumption and has a faster convergence rate, demonstrating its practical value.

    An image encryption algorithm based on fractional 2D-TFCDM mapping and improved Hilbert curve scrambling
    GAO Yingying, TIAN Ye
    2025, 47(01): 66-74. doi:
    Abstract ( 49 )   PDF (3220KB) ( 86 )     
    To enhance the security of digital images in transmission and solve the problems of strong correlation between image pixels and large amount of data, an image encryption algorithm based on fractional 2D- TFCDM mapping and improved Hilbert curve scrambling is proposed. Firstly, pseudo-random sequences are generated by fractional 2D-TFCDM mapping. Secondly, the plaintext image is partitioned and the subblock image is scrambled by the improved Hilbert curve. To fully weaken the correlation of the image and further improve the scrambling degree of the image, M×N Arnold transforms are performed on the image, and the parameters of each Arnold transform are changed. Finally, the XOR operation is combined with the average pixel value of the plaintext image for diffusion to obtain the final encrypted image. Three gray-scale images of 256×256 size are tested. The results show that the correlation between the pixels of the encrypted image is weak, and the encrypted image has good encryption effect, good statistical characteristics and strong anti-jamming ability, it can resist all kinds of common attacks effectively, and has good practical value in image encryption. 

    Computer Network and Znformation Security
    Anomaly detection of stream data based on grid density stacking
    WU Peicheng, ZHAO Xujun, JIN Lizhong
    2025, 47(01): 75-85. doi:
    Abstract ( 52 )   PDF (1074KB) ( 87 )     
    Most of the stream data anomaly detection algorithms employ a sliding single-window model, which leads to redundant calculations for a large number of data points and disturbs anomaly points due to the replacement of neighbors in the sliding window, thereby affecting the accuracy of anomaly detection algorithms. To address these issues, a combined window model is proposed, which utilizes several non-overlapping windows as the detection range for anomaly points. Based on this model, an anomaly detection algorithm based on grid density accumulation is introduced. Firstly, the kernel density estimation function is optimized and used to calculate the local density of data points. Then, a grid density accumulation operation is proposed to measure anomalous grids. In anomalous grids, the final anomalous data is determined by calculating the anomaly scores of data points. To improve the algorithm's efficiency, an adaptive pruning strategy is proposed to prune areas where anomaly points are unlikely to appear. Experimental results show that this algorithm exhibits significant advantages in both efficiency and accuracy compared to existing stream data anomaly detection algorithms.

    Construction and research of malware knowledge graph
    LUO Yangxia, LI Hao, WU Chenming
    2025, 47(01): 86-94. doi:
    Abstract ( 45 )   PDF (1280KB) ( 83 )     
    In recent years, knowledge graphs have been widely applied in the field of malware analysis, but most scholars have focused on constructing malware API knowledge graphs and using them to detect malicious code. However, the interpretability of API knowledge graphs is relatively weak, and they require a high level of expertise. To address these issues, this paper proposes using a named entity recognition (NER) model to extract text entity information such as malware names and discovery locations, thereby constructing a malware knowledge graph. This graph is then used to discover the diversity, evolution paths, threat methods, and classification associations of malware. Firstly, this paper studies the construction method of a malware knowledge graph, completing data preprocessing, schema layer construction, and data layer construction. Secondly, it identifies and standardizes entities in structured and semi-structured malware data to complete ontology construction (entities, relationships, and additional attributes). Guided by the schema layer, the data layer uses the BERT-BiLSTM-CRF model for knowledge extraction. Finally, the Neo4j graph database is utilized for storing and visualizing the knowledge graph. Simultaneously, the proposed model is validated through simulations using virus database data. Experimental results show that this model outperforms similar models in terms of effectiveness and performance indicators, and it is of great significance for simplifying cybersecurity knowledge and promoting the popularization of defense system knowledge.


    A blockchain-based medical data auditing method
    XU Chao, RUAN Rongyao, CHEN Yong,
    2025, 47(01): 95-106. doi:
    Abstract ( 59 )   PDF (1239KB) ( 97 )     
    Medical data serves as the most crucial pillar and driving force in the healthcare system, playing a vital role in the development of the health and medical field. Addressing issues such as medical data falsification and diagnostic errors in hospitals, this paper proposes a medical data audit method based on blockchain technology. From an audit perspective, this method combines interplanetary file system (IPFS) with blockchain for storing medical data, and utilizes smart contracts for data access and control, ensuring the authenticity, reliability, openness, and transparency of medical data. Furthermore, a deep learning-based medical data audit model with an attention mechanism is designed to automatically extract key features of medical data, aiming to more accurately locate audit clues in medical data. Experimental results demonstrate that the block-chain-based medical data audit method proposed in this paper not only guarantees the reliability of medical data but also improves the accuracy and precision of audit clue location compared to existing audit models.

    Graphics and Images
    An adaptive Gaussian function dehazing algorithm under channel difference prior
    REN Ruilin, YANG Yan
    2025, 47(01): 107-118. doi:
    Abstract ( 45 )   PDF (3634KB) ( 71 )     
    Addressing issues such as sky region distortion, color bias in results, and incomplete defogging in the process of image dehazing, an adaptive Gaussian function dehazing algorithm based on channel difference prior is proposed. Starting from the essence of degradation in foggy images, a statistical prior reflecting the intrinsic relationship between foggy and haze-free images, namely the channel difference prior, is introduced. Using this prior, a set of equations for foggy and haze-free images is established. The depth of field is approximately estimated using the difference between saturation and brightness of the foggy image. An adaptive standard deviation Gaussian function is designed to solve the equations and obtain the initial transmission map. After normalization, the "fog addition" phenomenon in bright regions is addressed, and joint bilateral filtering is used to further optimize the transmission map. Multi-scale filtering and geometric mean optimization are applied to refine the local atmospheric light, and the dehazed image is obtained by combining the atmospheric scattering model. Experimental results show that the proposed algorithm avoids distortion in the sky region, preserves rich detail information, and achieves significant dehazing effects while maintaining good image color.

    A road extraction method based on residual attention encoder-decoder network
    QI Ranran, PALIDAN Tuerxun, TANG Bochuan, QIAN Yurong,
    2025, 47(01): 119-129. doi:
    Abstract ( 34 )   PDF (1515KB) ( 77 )     
    Addressing the interference caused by similar-shaped objects in remote sensing images during road extraction, a residual attention encoder-decoder network (RAED-Net) is proposed. The encoder network of RAED-Net employs an improved channel attention residual module to extract local and global features from the input image. This module adaptively adjusts the weights of channel feature maps, enhancing the focus on important channel information and reducing background interference. In the decoder network, a strip convolution module is introduced to improve cross-channel information interaction during the upsampling process and enhance the ability to recover detailed road edge information, thereby improving the accuracy of road extraction results in complex environments. Comparative experimental results on two different types of public datasets demonstrate that RAED-Net can accurately extract road information, mitigate the interference caused by similar-shaped objects during road extraction, and achieve the best overall results with the smallest number of parameters. Especially on the mini DGRD dataset, which is fully annotated and highly complex, RAED-Net achieves improvements of 3.53%, 5.76%, and 2.21% in F1-score, IoU, and mIoU, respectively, compared to the second-best network.

    Video anomaly detection with improved attention hybrid auto-encoder
    CHEN Zhaobo, ZHANG Lin, MA Xiaoxuan
    2025, 47(01): 130-139. doi:
    Abstract ( 61 )   PDF (1186KB) ( 89 )     
    Video anomaly detection is one of the important research areas in computer vision, widely applied in fields such as transportation and public safety. However, the current field of video anomaly detection faces issues such as susceptibility to noise interference in individual prediction models and generalization anomalies in individual reconstruction models. To address these problems, a video anomaly detection method combining reconstruction and prediction models is proposed. A reconstruction network with an attention mechanism and a memory enhancement module is trained on normal optical flow data. The reconstructed optical flow and original video frames are then simultaneously input into a future frame prediction network, where the reconstructed optical flow serves as a conditional aid to assist the frame prediction network in better generating future frames. To extract more effective features, a residual convolutional attention module (SRCAM) is proposed to facilitate the reconstruction and prediction networks in effectively learning feature representations of latent spaces at both global and local levels, thereby enhancing the model's ability to detect anomalous events in videos and improving its robustness. Extensive experimental evaluations on two commonly used video anomaly detection datasets, UCSD Ped2 and CUHK Avenue, demonstrate the effectiveness of the proposed method.

    Artificial Intelligence and Data Mining
    A staged strategy incorporating reinforcement learning to solve the travelling thief problem
    ZHANG Zheng, XIA Xiaoyun, CHEN Zefeng, XIANG Yi
    2025, 47(01): 140-149. doi:
    Abstract ( 56 )   PDF (715KB) ( 84 )     
    The travelling thief problem (TTP) is a combination of the traditional traveling salesman problem(TSP) and the knapsack problem(KP), which is an NP-hard problem. Compared with the independent TSP and KP, the TTP is more realistic and has higher research value. Previous TTP solving algorithms are mainly heuristic algorithms with limited performance, and other types of algorithms are less studied. To acquire better solution for TTP, a staged strategy of incorporating reinforcement learning is proposed. The first stage generates an item selection plan based on the properties of items. The second stage uses a reinforcement learning algorithm (Actor-Critic algorithm) to solve the travel path. The third stage introduces neighborhood search strategy to optimize the obtained solution. Experiments show that the proposed algorithm achieves good results on most test cases and, in some cases, outperforms the compared algorithms in terms of solution quality, demonstrating the superior performance of the proposed algorithm. 

    An attention-guided dual-granularity cross-modal medical representation learning framework
    CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen
    2025, 47(01): 150-159. doi:
    Abstract ( 56 )   PDF (1303KB) ( 91 )     
    Deep learning has achieved significant results in medical imaging diagnosis, and models based on deep neural networks can effectively assist doctors in making decisions. However, as the scale of model parameters gradually increases, large-scale parameter models in the medical domain are increasingly facing the challenge of data scarcity, as the labeling of high-quality medical image data requires professional physicians to manually complete. One solution is to introduce medical report guidance training paired with medical images, which involves the interaction of two modalities. However, cross-modal alignment methods in the general field lack capture of detailed information and cannot be fully applicable to the medical domain. To address this issue, an attention-guided dual-granularity cross-modal medical representation learning framework ADCRL is proposed to achieve alignment of medical images and reports at both coarse-grained and fine-grained levels. ADCRL can extract features from medical images and medical reports at two granularities, use an attention-guided module to select image regions of interest for medical tasks and remove noisy regions, and align two modalities at different granularities through contrastive learning based proxy tasks. ADCRL trains models under unsupervised paradigms to understand the global and detailed  semantics of two modalities, and demonstrates excellent performance in downstream tasks using only limited annotated data. The main work include proposing fine-grained feature selection methods and a dual-granularity cross-modal feature learning framework, and pretraining and validating the effectiveness of the framework on publicly available medical datasets.

    WiFi-based human activity recognition using cross-sequence prediction and consistency comparison
    WANG Yang, XU Jiawei, WANG Ao, SONG Shijia, XIE Fan, ZHAO Chuanxin, JI Yimu
    2025, 47(01): 160-170. doi:
    Abstract ( 47 )   PDF (2883KB) ( 122 )     
    With the release of the IEEE 802.11bf standard, WiFi sensing technology has transitioned from academic research to industrial applications. Addressing the issue that existing WiFi-based human activity detection systems often rely on strong assumption constraints, this paper proposes a self- supervised model, CPCC-Fi, tailored for the field of WiFi sensing, starting from how to fully utilize unlabeled channel state information (CSI) samples. Based on the idea of contrastive learning, the model first employs sequential data augmentation to generate unlabeled CSI samples with different views. Then  it acquires the intrinsic representation features of the CSI sequences through self-supervised learning. After fine-tuning the model with a small number of labeled samples, effective perception and recognition of downstream human activities can be achieved. Relevant experiments conducted on both self-collected and public datasets demonstrate that the CPCC-Fi model outperforms CNN+Linear, CNN+Transformer+Linear, and TS-TCC in terms of performance.

    A hybrid strategy improved dung beetle optimization algorithm
    GAO Jiyuan, LIU Jie, CHEN Changsheng, LI Wei, LIU Ying, YANG Jing,
    2025, 47(01): 171-179. doi:
    Abstract ( 48 )   PDF (1005KB) ( 80 )     
    The dung beetle optimizer (DBO) is a novel global optimization meta-heuristic algorithm characterized by its strong optimization capability and fast convergence speed. However, it also has drawbacks such as being prone to local optima and low convergence accuracy. To address these issues, this paper proposes a hybrid strategy improved dung beetle algorithm (HSIDBO). Firstly, an improved Logistic chaos is used for population initialization to obtain a more uniformly distributed population. Secondly, an adaptive optimal guidance strategy is adopted to increase the algorithms convergence speed and local contraction ability. Finally, a lens imaging learning strategy is introduced to improve the dung beetles theft process, thereby enhancing the algorithm's local escape ability. Tests were conducted on 14 classic benchmark functions and engineering application problems. The results demonstrate that the integration of these three strategies can effectively enhance the performance of the dung beetle optimizer.

    Document-level neural machine translation based on rhetorical structure
    JIANG Yunzhuo, GONG Zhengxian
    2025, 47(01): 180-190. doi:
    Abstract ( 43 )   PDF (719KB) ( 90 )     
    Despite years of development and significant progress in document-level neural machine translation, most efforts have focused on building effective network structures from a model perspective by utilizing contextual word information, neglecting the guidance of cross-sentence discourse structure and rhetorical information for the model. Addressing this issue, under the guidance of Rhetorical Structure Theory, a method for separately encoding discourse units and rhetorical structure tree features is proposed. Experimental results show that the proposed  method enhances the encoders ability to represent discourse structure and rhetorical aspects. The improved model surpasses several high-quality baseline models, achieving notable improvements in translation performance across multiple datasets. Additionally, significant improvements in translation quality are demonstrated through the proposed quantitative evaluation method and human analysis.

Please wait a minute...

Special column More...

Contact us

    Office location:109 Deya Rd,Changsha,hunan
    Post code:410073
    Telephone:0731-87002567
    E-mail : jsjgcykx@vip.163.com
    website:http://manu46.magtech.com.cn/ces