Loading...
  • 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Current Issue

    • High Performance Computing
      State of the art analysis of China HPC 2025
      ZHANG Yunquan, DENG Li2, YUAN Liang, YUAN Guoxing
      2025, 47(12): 2091-2098. doi:
      Abstract ( 45 )   PDF (1066KB) ( 38 )     
      In this paper, according to the latest China HPC TOP100 rank list released by CCF TCHPC in the  November, the total performance trends of China HPC TOP100 of 2025 are presented. Following this, characteristics of performance, manufacturer, architecture and application areas are analyzed separately in detail.


      A multi-hybrid encoding digital compute-in-memory macro design
      GUO Ruiqi, YANG Zhuohang, CHEN Xiaofeng, WANG Lei, WANG Yang, HU Yang, YIN Shouyi
      2025, 47(12): 2099-2107. doi:
      Abstract ( 28 )   PDF (1866KB) ( 18 )     
      Compute-in-memory (CIM) is considered a promising solution to overcome the “memory wall” bottleneck, enhancing energy efficiency and area efficiency significantly. This paper proposes a novel digital SRAM-based compute-in-memory macro architecture. It optimizes power consumption and enhances chip energy efficiency by means of hybrid encoding of weight data and activation data. Additionally, a series of circuit-level optimizations are performed on the core adder tree circuit to improve the chip’s area efficiency. Under TSMC’s 28 nm process library, the proposed DCIM macro with hybrid encoding optimization improves energy efficiency by 2.17 times  at 0.9 V, 250 MHz, using the ResNet20 test model. The adder tree optimization reduces 14.2% area in the overall DCIM macro. Finally, a 256×64 DCIM achieves an energy efficiency of 20.83 TOPS/W when processing the ResNet20 model. 


      A processor power modeling accuracy improvement method based on static and dynamic sample point reconstruction
      ZHONG Jiaqing, CHEN Juan, ZHOU Yichang, WU Xianyu, WANG Rui, YU Xiang
      2025, 47(12): 2108-2118. doi:
      Abstract ( 21 )   PDF (1688KB) ( 11 )     
      Establishing a high-precision, fine-grained CPU power consumption model is crucial for power management and optimization in computer systems. Addressing challenges such as the imbalance in the quantity and type distribution of modeling datasets in multi-core processor modeling, this paper proposes a method to enhance processor modeling accuracy based on the reconstruction of static and dynamic program sample points. Program samples are composed of data collected by performance monitor- ing counters (PMCs) during program execution. The static reconstruction algorithm reconstructs program sample points from three dimensions: Feature selection, time granularity refinement, and spatial redundancy reduction. As a complement to the static reconstruction algorithm, the dynamic reconstruction algorithm focuses on the behavior of programs running under various optimization techniques, such as different compilation options or varying resource loads. It selects program samples optimized with appropriate techniques to supplement the program sample points. To evaluate the impact of the static and dynamic sample point reconstruction algorithms on power modeling, this paper assesses five program benchmark suites on x86 and ARM processor platforms. The experimental results show that on two x86 platforms, when the power consumption models employ linear model, neural network model, and random forest model respectively, the average accuracy improvements are 74.80%, 65.70%, and 32.24%, as well as 61.61%, 80.44%, and 18.76%. On the ARM platform, the average accuracy improvements for linear model, neural network model, and random forest model are 22.34%, 34.63%, and 34.36%, respectively. 

      Incompressible fluid simulation algorithm optimization of  OpenFOAM on Tianhe supercomputing
      LIU Zhongmin, ZHANG Xiang, MA Di, SUN Yang, ZHOU Lei, QIU Qi, GONG Chunye
      2025, 47(12): 2119-2128. doi:
      Abstract ( 26 )   PDF (1933KB) ( 13 )     
      The incompressible fluid simulation solvers in the open-source fluid dynamics software OpenFOAM exhibit cross-platform applicability. However, their performance optimizations are predominantly tailored to supercomputing systems with existing architectures such as Intel, rendering their algorithmic optimizations unable to fully leverage the vectorized parallel advantages of the ARM architecture on the Tianhe supercomputing system. To address this, this paper focuses on incompressible fluid simulation solvers as the research subject and employs ARM vectorization techniques to optimize their symmetric Gauss-Seidel (SGS) method and diagonal incomplete Cholesky preconditioned conjugate gradient (DIC-PCG) method, thereby enhancing the solver’s operational efficiency. To achieve vectori- zation goals, this paper analyzes the relationships between neighboring grid cells during a single iteration of the two types of solving algorithms, revealing that the maximum number of neighboring cells is two and that there are no dependencies between them. Leveraging this prior knowledge, the original algorithm code is modified with minimal cost—specifically, by adding just four lines of if-else conditional statements—to vectorize the neighboring cells and accelerate the algorithms. Experimental results under various configurations demonstrate that the improved algorithm achieves a maximum single-core speedup of 1.75 and a maximum multi-core speedup of 149.16, with a parallel efficiency still reaching 29.13%.


      OpenLM: A multi-platform and high-performance large language model inference framework
      LIU Gao, XU Jianliang, ZHANG Xianyi, LIU Xiandong
      2025, 47(12): 2129-2138. doi:
      Abstract ( 21 )   PDF (1260KB) ( 14 )     
      As computational devices continue to diversify and computational power grows rapidly, the increasing number of large language models (LLMs) has made efficient multi-model inference across heterogeneous platforms a complex and formidable challenge. To address this, we propose  OpenLM, a high-performance inference framework to support efficient deployment of multiple LLMs on diverse hardware platforms. The OpenLM framework boasts extensive model compatibility, providing efficient performance support for a wide range of models. It incorporates high-performance computing operators optimized for multiple platforms and architectures to maximize hardware performance. Meanwhile, OpenLM features a flexible framework architecture that facilitates rapid integration and support for the latest models. To further optimize memory (both GPU and CPU memory) consumption, task scheduling, and system stability during the inference process, the framework introduces features such as Paged- Attention mechanisms, dynamic batching, weight quantization, and KV cache quantization. According to the experimental results, these optimization strategies effectively enhance inference efficiency, reduce resource overhead, and bolster overall framework performance.


      Computer Network and Znformation Security
      Several constructions of (almost) optimally extendable linear codes from MDS codes and NMDS codes
      LI Wenting, HENG Ziling, LI Xiaoru
      2025, 47(12): 2139-2149. doi:
      Abstract ( 16 )   PDF (539KB) ( 10 )     
      In the implementation of block ciphers, side channel attacks(SCAs) and fault injection attacks(FIAs) are crucial cryptanalysis methods. Let C  be a linear code over Fq  with a generator matrix G, and C′  be a linear code over Fq  with a generator matrix G′=[G:Ik], where Ik   is the identity matrix of order  k. If  d(C′⊥)=d(C⊥), then C  is said to be an optimally extendable linear code; if  d(C′⊥)= d(C⊥)-1, then C  is said to be an almost optimally extendable linear code. Optimally or almost optimally extendable linear codes effectively protect not only sensitive data stored in registers from SCAs and FIAs but also the entire algorithm. A class of almost optimally extendable linear codes with dimension 5 is constructed by special generator matrices, and its parameters and weight enumerators are obtained. In addition, it is proved that another 4 classes of NMDS (near maximum distance separable) codes with dimension 5 and 2 classes of NMDS codes with dimension 4 are optimally extendable linear codes. In particular, the parameters of the (almost) optimally extendable linear codes are different from those of known (almost) optimally extendable linear codes, and the  constructed codes have potential applications in direct sum masking.

      Research on communication network structure and load configuration in the context of cyber warfare
      LI Yonghui, WU Yuyue, DENG Fengxian, Si Shoukui, ZHAO Wenfei
      2025, 47(12): 2150-2159. doi:
      Abstract ( 17 )   PDF (1540KB) ( 9 )     
      In the context of cyber warfare, the structural design and load configuration of military communication networks should prioritize the prevention and management of node or edge damage. The satisfaction degree of communication requirements among network nodes is an appropriate metric for evaluating network performance. Therefore, this paper proposes the concept of surplus degree, which considers setting up alternate paths in advance to cope with the loss of network functionality. The goal of network design is to reduce the number of hop counts in communication paths and to increase and balance their surplus degree. This paper address the complex military network load configuration problem through the use of a minimal hop count path searching model, which plans paths incrementally, along with a relatively straightforward bi-objective programming model. The proposed path and load design algorithm is more in line with the characteristics and requirements of military network design compared to existing research on network load configuration. Through numerical simulations, the algorithm’s universality and superiority in a given network environment were compared and verified. Suggestions are then made to further optimize the current algorithm in order to find the optimal solution and to reduce the computational complexity for complex large-scale networks. 

      A vertical handover algorithm based on fuzzy neural network in heterogeneous wireless network#br#
      LU Qingsong, QIU Yinghui
      2025, 47(12): 2160-2168. doi:
      Abstract ( 19 )   PDF (833KB) ( 11 )     
      Addressing the vertical handover issue in heterogeneous wireless networks, this paper proposes a algorithm based on hybrid filter and fuzzy neural networks, taking into account different service types and network-side parameters. Firstly, a combined approach of Kalman filter and moving average filter is employed to process the received signals. Subsequently, the processed received signal strength (RSS), bandwidth, and delay are used as inputs to the fuzzy neural network, which outputs evaluation scores for candidate networks. Finally, the system selects the optimal access network based on different service types and the evaluation scores of the networks. Simulation results demonstrate that the proposed algorithm can make reasonable decisions in complex and dynamic heterogeneous wireless network scenarios, effectively reducing the probabilities of handover failure and service blocking, enhancing system throughput, and ensuring the quality of service (QoS) for users.

      Evaluation of attribute access control policy integrating clustering and structural optimization
      XIA Tong, YUAN Lingyun, XIE Tianyu
      2025, 47(12): 2169-2180. doi:
      Abstract ( 16 )   PDF (2331KB) ( 9 )     
      To accelerate the response speed for user requests to access resources, this paper proposes an evaluation method for attribute-based access control policies that integrates clustering and structural optimization. Firstly, a rule distance weight matrix is constructed to calculate the actual distances between non-numeric rule data points. Secondly,  large-scale policy sets are processed using the CKmeans (canopy k-means) two-stage clustering method, dividing it into several small-scale policy clusters to reduce the scope of policy matching. Finally, based on a rule structure optimization and integration approach, the number of rule entries within clusters is compressed, minimizing the number of comparisons between access requests and cluster rules, and a hash cache table is introduced to expedite access for repeated requests. The effectiveness of the proposed method is validated using multiple XACML (extensible access control markup language) access control policies from real-world systems. Experimental results demonstrate that, compared to existing evaluation engines such as Sun’s XACML and Xengine, as well as four types of machine learning methods, the proposed method significantly reduces time overhead across three policy sets—LMS, VMS, and ASMS—with a maximum reduction of approximately three orders of magnitude, greatly enhancing policy evaluation efficiency.


      Graphics and Images
      A facial manipulation adversarial defense method for image post-processing
      XU Kun, QI Shuren, ZHANG Yushu, WEN Wenying, ZHANG Hua
      2025, 47(12): 2181-2194. doi:
      Abstract ( 28 )   PDF (2245KB) ( 13 )     
      Current facial manipulation technologies have advanced to the point where they can easily modify facial attributes, making it difficult for the human eye to distinguish between real and fake images. Facial image data is readily accessible and can be exploited to forge human faces, posing a constant threat to users’ personal privacy and information security. Consequently, leveraging adversarial defense methods to prevent facial images from being manipulated has become an active area of current research. However, most existing methods primarily focus on the defensive effectiveness against adversarial perturbations added to images, lacking in-depth analysis of scenarios where these adversarial perturbations are subsequently disrupted. To address this gap, this paper proposes an adversarial defense method for facial manipulation targeting image post-processing. By conducting a comprehensive and in-depth analysis of original images, images with adversarial perturbations, and images with disrupted adversarial perturbations, an image adversarial defense model based on contrastive learning is constructed. A thorough comparison and evaluation of the proposed adversarial defense method were conducted, and the experimental results demonstrate that the proposed method exhibits effective defense capabilities against facial manipulation.


      A multi-level adversarial mean teacher network for semantic segmentation of nighttime urban landscape
      XU Mengfan, HUANG Wei, GU Zhuoming
      2025, 47(12): 2195-2203. doi:
      Abstract ( 22 )   PDF (1537KB) ( 9 )     
      To address the issue of suboptimal segmentation performance caused by the inadequate adaptability of current methods to nighttime scenes, this paper proposes a multi-level adversarial mean teacher network based on domain adaptation. The proposed methods segmentation process operates in two stages: Firstly, a curriculum style transfer strategy selects dusk scenes as the target style and transforms both daytime and nighttime images into dusk-style images. This approach decomposes the complex style transfer task into two simpler tasks, facilitating input style alignment. Subsequently, the multi-level adversarial mean teacher network performs adversarial learning at both the feature level and prediction probability level, achieving domain adaptation between the source and target domains across multiple levels and enhancing the models generalization capability across different domains. Additionally, the network employs dynamic class-domain mixing to introduce an extra mixed sample, enabling the model to learn richer dynamic class features. Experimental results demonstrate that the methods model achieves mIoU of 46.5%, 37.9%, and 47.8% on Dark Zurich, ACDC, and Nighttime Driving datasets, respectively. These findings indicate that the proposed method effectively improves the  adaptability and enhances the segmentation accuracy for nighttime urban landscapes.
      An improved YOLOv8 small object detection model in aerial image#br#
      #br#
      WEI Liumei, LUO Xuemei, KANG Jian
      2025, 47(12): 2204-2215. doi:
      Abstract ( 25 )   PDF (1505KB) ( 13 )     
      To address the issues of low detection accuracy, frequent missed detections, and false detections of small objects in unmanned aerial vehicle (UAV) aerial images, this paper proposes an improved small-object detection model named MDH-YOLOv8. Firstly, the Focal-EIoU loss function is employed to replace the CIoU Loss, resolving the problem of inaccurate regression results. A Small- object feature information extraction SAE(self-attention information extraction) module is designed to mitigate the insufficient information extraction of the spatial pyramid pooling fast (SPPF) module, enabling the model to simultaneously focus on multiple key small-object regions within the image. Secondly, a C2f_DCN module adaptable to complex geometric deformations is introduced, where deformable convolutions are fused with multiple iterations of bottleneck layers to enhance the robustness of the detection model. Finally, a dedicated small-object detection head (STDH) module is added to reduce the false detection and missed detection rates of small objects, thereby improving detection accuracy. Experimental results on the VisDrone2019 and DOTA datasets demonstrate that the MDH-YOLOv8 model achieves a 4.2 percentage point increase in mAP@0.5 and a 3.4 percentage point  increase in mAP@ 0.5:0.95 compared to the YOLOv8 model. Compared to mainstream models for small-object detection, the MDH-YOLOv8 model improves detection accuracy for small objects while maintaining a lightweight design.



      A sparrow search algorithm based on hybrid multi-strategy and its application
      LOU Li, ZHANG Huiru
      2025, 47(12): 2216-2226. doi:
      Abstract ( 25 )   PDF (2171KB) ( 12 )     
      The fuzzy c-means (FCM) clustering algorithm has become a popular choice among many scholars for image segmentation due to its simplicity in implementation and alignment with practical scenarios. However, the traditional FCM algorithm has a drawback:Random initialization of cluster centers. To appropriately select cluster centers, this paper proposes a hybrid multi-strategy sparrow search algorithm (SSA). By leveraging the strong optimization capability of the SSA, the initial cluster centers of the FCM algorithm are optimized to enhance its segmentation performance. The algorithm’s approach is as follows: First, to address the deterioration of population diversity in the later stages of the SSA, the Fuch chaotic map is introduced. To mitigate the tendency of the sparrow population to oscillate around local extrema, pinhole imaging opposition-based learning is employed to update the positions of discoverers. Additionally, to improve the global search capability of the sparrow population, GaussCauchy mutation is introduced to update the positions of followers. Ultimately, an improved SSA with enhanced optimization accuracy and speed is obtained. The objective function of the FCM algorithm is used as the optimization function of the improved SSA for natural scene and cell image segmentation experiments. Compared to the standard FCM algorithm, the proposed algorithm demonstrates an approximately 5 percentage point improvement in the average partition coefficient and enhanced robustness.


      Artificial Intelligence and Data Mining
      Diversified ranking of search result: Recent progress and prospects
      LI Jinzhong, LIU Weidong, CHEN Shengbo
      2025, 47(12): 2227-2252. doi:
      Abstract ( 26 )   PDF (1291KB) ( 11 )     
      Traditional search engines, which only return sorted relevant search results based on keyword queries, can no longer meet users’ increasingly diverse information needs. To address this, search result diversification ranking technology has emerged. On the basis of maintaining query relevance, this technology considers the novelty of different documents and users’ potential diverse query intents, presenting users with ranked results that are more comprehensive and rich in content. With the rapid development and breakthroughs of deep learning technology, deep learning models such as generative adversarial networks(GANs) and graph neural networks(GNNs) have also been widely applied in the field of search result diversification ranking. Although a large number of latest research achievements in search result diversification ranking have emerged recently, there is a lack of review work on newly proposed search result diversification ranking methods and other related studies. Based on this, this paper conducts a relatively comprehensive review of the research progress in search result diversification ranking over the past 5 years. Firstly, it reviews the development status of search result diversification ranking and related reviews, and expounds on the definition of the search result diversification ranking problem. Secondly, it classifies the latest methods of search result diversification ranking in the past 5 years, and focuses on analyzing the representative methods in each category. Thirdly, it elaborates on and analyzes mainstream and novel evaluation metrics for search result diversification, summarizes the existing major datasets for search result diversification, sorts out and analyzes the performance of the latest search result diversification ranking methods, and finally summarizes the application progress of current search result diversification ranking technologies. Furthermore, it looks forward to the future research directions of search result diversification ranking, aiming to provide references for relevant researchers in their studies on search result diversification ranking and promote further development and innovation in this field.


      Domain oriented discontinuous named entity recognition based on large language model
      TANG Jintao, ZHANG Chengxian, BAO Chenlong, LI Wenjing
      2025, 47(12): 2253-2260. doi:
      Abstract ( 30 )   PDF (689KB) ( 14 )     
      In professional fields, the compositional logic between terms is more complex, leading to issues such as complex entities represented by discontinuous named entities. To address the task of discontinuous named entity recognition (DNER), this paper proposes a recognition method that leverages the understanding and generation capabilities of large language models (LLMs). This method  discontinuous entity recognition as a sentence rewriting task: It designs rules to convert discontinuous named entity recognition datasets into sentence rewriting datasets, and performs output fine-tuning on the large language model. In the named entity recognition phase, based on the rewritten sentences, it designs rule-based instructions using prompt learning, and implicitly prompts the large language model with domain-specific information (e.g., the field of the data) through character role dialogue, which further improves the entity recognition performance. Experimental results show that on three datasets, this method improved F1 scores by 3.23%, 0.28%, and 1.04% respectively compared to the state-of-the-art (SOTA) methods based on small models on CSIRO adverse drug event corpus(CADEC), shared annotated resources 2013(ShARe13) and shared annotated resources 2014(ShARe14). These results verify that the generation capability of large models contributes to the complex task of named entity recognition in professional fields.


      Multimodal end-to-end Mongolian speech translation based on multi-task learning and knowledge distillation
      ZANG Richeng, GAO Guanglai , FEI Long
      2025, 47(12): 2261-2268. doi:
      Abstract ( 20 )   PDF (1117KB) ( 9 )     
      End-to-end speech translation technology aims to realize the automatic conversion from source-language speech to target language, and has achieved significant progress in multiple fields in recent years. However, its performance in Mongolian speech translation still needs improvement. This challenge mainly stems from the scarcity of Mongolian-Chinese speech translation datasets, which leads to poor performance of existing models in handling Mongolian speech translation tasks. To overcome these difficulties, this study adopts the following measures: Firstly, a large-scale Mongolian-Chinese parallel speech translation dataset is collected and constructed to support the training of translation models. Secondly, a joint learning strategy is introduced; through parameter sharing between the encoder and decoder, knowledge transfer between speech translation and machine translation tasks is promoted. In addition, to narrow the modal gap between speech and text, a cross-attention regularization method is adopted to enhance the model's ability to understand and utilize inputs of different modalities. Through knowledge distillation technology, the machine translation model is dynamically updated, which further improves the performance of the speech translation model. Finally, a speech synthesis module is integrated to realize speech-to-speech translation. Experimental results show that the method proposed in this study achieves a significant improvement in translation accuracy: compared with the directly trained speech translation model, its BLEU score almost increased by 2.00.

      Batch process fault diagnosis based on ITCN-IDBO-SVM#br#
      #br#
      LIANG Xiuxia, HE Yueyang, LIU Chong, LIANG Tao
      2025, 47(12): 2269-2280. doi:
      Abstract ( 17 )   PDF (3370KB) ( 13 )     
      To improve the accuracy of batch process fault diagnosis and address the dependence of traditional classifiers on feature extraction, this paper proposes a fault diagnosis model that combines the improved temporal convolutional network (ITCN), improved dung beetle optimizer (IDBO), and support vector machine (SVM). Fault diagnosis is divided into two processes: fault feature extraction and classification diagnosis. First, ITCN is used to extract features from batch process data, and the outputs of the fully connected layers are taken as the input to the IDBO-SVM classification layers. Second, IDBO is employed to optimize the parameters of SVM to enhance the classification accuracy of the model; meanwhile, t-distributed stochastic neighbor embedding (T-SNE) is used for visual analysis to further verify the model’s feature extraction and classification capabilities. Finally, comparative experiments are conducted on the penicillin fermentation process dataset, where the proposed model is compared with the original temporal convolutional network (TCN) and convolutional neural networks (CNNs). The experimental results show that the proposed model not only improves the accuracy of fault identification but also exhibits excellent generalization performance.