Loading...
  • 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Current Issue

    • High Performance Computing
      A high-precision oscillator noise analysis model of ISF based on PSS+PXF
      YUAN Heng-zhou, , SANG Hao, LIU Sheng, CHEN Xiao-wen, YAN Guang-da, GUO Yang,
      2024, 46(06): 951-958. doi:
      Abstract ( 129 )   PDF (1160KB) ( 302 )     
      This paper proposes a phase noise model of ISF based on PSS+PXF to predict the phase noise of the oscillator. Compared with the traditional Lazavi model, the model takes into account the nonlinear time-varying characteristics of the oscillator, so it is more accurate. The effectiveness of the model is verified through simulation. The phase noise model precision of the ISF based on PSS+PXF is more than 200% higher than that of the traditional Razavi model.

      Exploration of the many-core data flow hardware architecture based on Actor model
      ZHANG Jia-hao, DENG Jin-yi, YIN Shou-yi, WEI Shao-jun, HU Yang
      2024, 46(06): 959-967. doi:
      Abstract ( 90 )   PDF (1750KB) ( 226 )     
      The distributed training of ultra-large-scale AI models poses challenges to the communication capability and scalability of chip architectures. Wafer-level chips integrate a large number of computing cores and inter-connect networks on the same wafer, achieving ultra-high computing density and communication performance, making them an ideal choice for training ultra-large-scale AI models. AMCoDA is a hardware architecture based on the Actor model, aiming to leverage the highly parallel, asynchronous message passing, and scalable characteristics of the Actor parallel programming model to achieve distributed training of AI models on wafer-level chips. The design of AMCoDA includes three levels: computational model, execution model, and hardware architecture. Experiments show that AMCoDA extensively supports various parallel patterns and collective communications in distributed training, flexibly and efficiently deploying and executing complex distributed training strategies. 

      Design of independent software stack of FT-Matrix DSP
      SHI Yang, CHEN Zhao-yun, SUN Hai-yan, WANG Yao-hua, WEN Mei, HU Xiao
      2024, 46(06): 968-976. doi:
      Abstract ( 200 )   PDF (1494KB) ( 259 )     
      FT-Matrix DSP (Digital Signal Processor) is a high-performance digital signal processor designed independently by the College of Computer Science and Technology,National University of Defense Technology, to break through the key technology and resolve the long-standing problem of dependency on foreign-made DSPs in key areas in China. Because the series of FT-Matrix chips use a self-designed instruction set, they cannot be compatible with existing software. A self-sufficient, complete, and efficient software stack is crucial for determining the vitality of FT-Matrix DSP. Based on the team's accumulated work, this paper systematically explains the design principles and hierarchical architecture of the FT-Matrix DSP software stack, focusing on innovative functions, implementation methods, and performance of related software tools including support layers, compilation layers, and tool layers. Additionally, drawing upon user feedback and the team's insights, this paper also discusses relevant issues that need to be explored in the future development of the FT-Matrix DSP software stack. 

      Design of high-speed BGA and PCB transmission structure for extended Chiplet application
      CHEN Tian-yu, LI Chuan, WANG Yan-hui
      2024, 46(06): 976-983. doi:
      Abstract ( 74 )   PDF (1667KB) ( 223 )     
      Aiming at the interconnect design in the extended Chiplet area, the analysis methods and optimization measures of via crosstalk in BGA region are mainly studied. Firstly, modeling & calculating on unit array vias is proposed to evaluate the crosstalk of the whole BGA area vias. Then, a multi-layer fan-out modeling platform is constructed for analysis requirements of different wiring layers. Results from several unit array models and multi-layer fan-out model verify each other, indicating that taking the array unit as the minimum part to evaluate the crosstalk is accurate, and the multi-layer fan-out modeling method is efficient and feasible. PCB vias crosstalk corresponding to two different BGA pin assignment is analyzed by multi-layer fan-out modeling. Results show that increasing the ratio of spacing between neighboring signal vias to spacing between neighboring signal via and ground via is more effective than increasing the quantity of ground vias or the BGA pitch.

      Dense linear solver on many-core CPUs:Characterization and optimization
      FU Xiao, SU Xing, DONG De-zun, QIAN Cheng-dong
      2024, 46(06): 984-992. doi:
      Abstract ( 96 )   PDF (1739KB) ( 180 )     
      The dense linear solver plays a vital role in high-performance computing and machine learning. Typical parallel implementations are built upon the well-known fork-join or task-based programming model. Though mainstream dense linear algebra libraries adopting the fork-join paradigm can shift most of the computation to well-tuned and high-performance BLAS 3 routines, they fail to exploit many-core CPUs efficiently due to the rigid execution stream of fork-join. While open-source implementations employing the task-based paradigm can provide more promising performance thanks to the models malleability and better load balance, they still leave much room for optimization on many-core platforms, especially for medium-sized matrices. In this paper, a quantitative characterization of the dense linear solver is carried out to locate performance bottlenecks and a series of optimizations is proposed to deliver higher performance. Specifically, idle threads are reduced by merging LU factorization with the following lower triangular solver to improve parallelism. Moreover, duplicated matrix packing operations are reduced to lower memory overhead. Performance evaluation is conducted on two modern many-core platform, Intel Xeon Gold 6252N (48 cores) and HiSilicon Kunpeng 920 (64 cores). Evaluation results show that our optimized solver outperforms the state-of-the-art open-source implementation by a factor up to 10.05% (Xeon) and 13.63% (Kunpeng 920) on the two platforms, respectively.

      Research on wafer-scale chip mapping task based on genetic algorithm
      LI Cheng-ran, FANG Jia-hao, YIN Shou-yi, WEI Shao-jun, HU Yang
      2024, 46(06): 993-1000. doi:
      Abstract ( 99 )   PDF (1212KB) ( 213 )     
      In recent years, with the development of artificial intelligence, deep learning has become one of the most important computing loads today. The next generation of artificial intelligence (AI) and high-performance computing applications have put unprecedented demands on the computing power and communication capabilities of computing platforms. Wafer-scale chips integrate ultra-high-density transistors and interconnect communication capabilities on the entire wafer, so it is expected to provide revolutionary computing power solutions for future AI and super-computing platforms. Among them, the huge computing resources and unique new architecture of wafer-scale chips pose unprecedented challenges to task mapping algorithms. Related research has become a major focus of academic research in recent years. This paper focuses on studying the mapping methods of AI tasks on wafer-scale hardware resources. By expressing the AI algorithm as multiple convolutional kernels and considering the computational power characteristics of convolutional kernels, a mapping algorithm for wafer-scale chips is designed based on genetic algorithms. The simulation results under a series of mapping tasks verifies the effectiveness of the mapping algorithm and revealed the impact of parameters such as execution time and adapter cost on the cost function. 

      Computer Network and Znformation Security
      Exact repair regeneration code data repair scheme under bandwidth heterogeneous networks
      WANG Yan, PI Chan-juan, LIU Ya-dong, SHI Jun-hao
      2024, 46(06): 1001-1012. doi:
      Abstract ( 82 )   PDF (1052KB) ( 165 )     
      Regeneration code technique has been widely used in the field of data storage with the advantages of high fault tolerance and low redundancy overhead, but the redundancy technique based on regeneration codes needs to download multiple coded blocks from other providers for repairing the failed data. Considering the heterogeneity of link bandwidth between nodes, the available bandwidth capacity of links varies greatly in real networks, minimization of network traffic does not necessarily imply minimization of regeneration time. Moreover, existing regeneration code repair schemes for bandwidth heterogeneity are difficult to support exact repair regeneration codes. Due to the specific mathematical structure of exact repair regeneration codes, their parallel repair is difficult to achieve. Therefore, ERC-TREE is proposed as a repair framework for exact repair regeneration codes under bandwidth heterogeneous networks. This framework efficiently takes advantage of the available bandwidth between providers by constructing an optimal tree to achieve exact repair of failed node data. The simulation experiment shows the feasibility of tree repair for exact repair regeneration codes in heterogeneous bandwidth environments. In the scenario with a significant difference in bandwidth, ERC-TREE reduces the data repair time by 78% compared to star repair. 

      Copyright protection of open-sourced datasets based on invisible backdoor watermarking
      HUANG Zhi-hui, XIAO Xiang-li, ZHANG Yu-shu, XUE Ming-fu
      2024, 46(06): 1013-1021. doi:
      Abstract ( 94 )   PDF (1264KB) ( 187 )     
      To address the copyright protection issue in the field of image classification datasets, a traceable method based on invisible backdoor watermarking, named IBWOD, is proposed. This method ensures the watermark’s strong concealment while maintaining good usability and effectiveness. Firstly, an encoder-decoder network is used to embed the backdoor watermark into selected samples, generating watermark samples. Secondly, the labels of these watermark samples are modified to specified labels, and then the watermark samples are merged with unmodified samples to form a watermark dataset. Models trained using this watermark dataset will leave a specific backdoor, i.e., a mapping relationship from the backdoor watermark to the specified labels. Finally, a corresponding model verification algorithm is proposed, based on this special mapping relationship, to verify if a suspicious model has used the watermark dataset. Experimental results demonstrate that IBWOD can effectively verify whether a model has used the watermark dataset and possesses strong concealment. 

      An intrinsic secure open shortest path first protocol based on identity cryptography
      XUN Peng, CHEN Hong-yan, WANG Yong-zhi, LI Shi-jie
      2024, 46(06): 1022-1031. doi:
      Abstract ( 65 )   PDF (1068KB) ( 172 )     
      Routing protocols like Open Shortest Path First Version 2(OSPFV2) TCP/IP internet routing protocol play a crucial role in the connectivity and secure transmission of information within networks. However, traditional OSPFV2 lacks the capability to defend against source route spoofing or route information tampering, making networks vulnerable to attacks. Existing security strategies are often add-on solutions, which can lead to new security issues or have low security effectiveness. To address this, a novel OSPFV2 protocol based on identity-based cryptography is proposed. This protocol embeds identity-based cryptography within the routing exchange process, enabling networks to efficiently defend against route tampering and spoofing attacks internally. Furthermore, considering various limitations in deploying secure OSPFV2 protocols on a large scale, an operational mechanism supporting incremental deployment is designed using opaque link state advertisements. Simulation experiments demonstrate that the proposed internally secure OSPFV2 protocol possesses the capability to resist source route spoofing and data tampering while minimizing convergence delay.


      A privacy protection recommendation algorithm in block chain environment
      ZHAO Wen-tao, GUAN Li-he, HE Jian-guo, TANG Hao
      2024, 46(06): 1032-1040. doi:
      Abstract ( 92 )   PDF (888KB) ( 190 )     
      For the problem that recommendation algorithms in the blockchain environment are difficult to resist malicious attacks and have poor recommendation results. On the one hand, a fast homomorphic encryption algorithm based on integer vector is proposed to protect the privacy protection of user data, and its security is guaranteed by the LWE problem. On the other hand, an efficient recommendation algorithm is designed based on E2LSH, which distributes the key according to the hash bucket number, so that users under the same hash bucket can perform homomorphic encryption operations and quickly calculate the similarity. On the basic system model of blockchain+IPFS, a comparison experiment with the latest relevant privacy-preserving recommendation algorithms is conducted using public datasets. The results show that the algorithms in this paper have an ideal recommendation effect and speed while security and privacy are guaranteed.


      Graphics and Images
      A computer wargame path planning method based on influence map
      Lv Qian-ru, YANG Xiang-rui, CAI Zhi-ping
      2024, 46(06): 1041-1049. doi:
      Abstract ( 82 )   PDF (2262KB) ( 181 )     
      Computer wargame  is a simulation tool that describes war operations. Force maneuver is the foundation of operations, and path planning is the core content. Path planning is simplified as the solution of the shortest path in graph theory. However, the tactical path is not equivalent to the shortest path. Considering the complexity of battlefield maneuvering, information diversity, dynamics and other characteristics, the use of a decision-making method that separates battlefield situation and path planning can lead to operational failure. This article provides a tactical path planning method that effectively integrates battlefield situation and map information. This method, based on traditional A* algorithm, combines influence map to digitize battlefield situation factors, and uses digitized situation factors combined with terrain factors as the objective function of the improved A* algorithm. Thus, under the same algorithm complexity conditions, the A* algorithm converges to the tactically optimal path. Simulation experiments effectively verify that this method can support more complex and diverse tactical path planning compared to traditional A* algorithm. Battlefield situation information can guide path planning to effectively reduce the damage suffered by one party during maneuvering and improve our offensive capabilities, shaping a generally advantageous situation.

      An improved dense pedestrian detection algorithm based on YOLOv8: MER-YOLO
      WANG Ze-yu, XU Hui-ying, ZHU Xin-zhong, LI Chen, LIU Zi-yang, WANG Zi-yi
      2024, 46(06): 1050-1062. doi:
      Abstract ( 380 )   PDF (3288KB) ( 490 )     
      In large-scale crowded places, abnormal crowd gathering occurs from time to time, which brings certain challenges to the dense pedestrian detection technology involved in application scenarios such as autonomous driving and large-scale public place crowd monitoring systems. The new generation of dense pedestrian detection technology requires higher accuracy, smaller computing overhead, faster detection speed and more convenient deployment. In view of the above requirements, a lightweight dense pedestrian detection algorithm MER-YOLO based on YOLOv8 is proposed, which first uses MobileViT as the backbone network to improve the overall feature extraction ability of the model in pedestrian gathering areas. The EMA attention mechanism module is introduced to encode the global information, further aggregate pixel-level features through dimensional interaction, and strengthen the detection ability of small targets by combining the detection head with 160×160 scale. The use of Repulsion Loss as the bounding box loss function reduces the missed detection and misdetection of small target pedestrians under dense crowds. The experimental results show that compared with YOLOv8n, the mAP@0.5 of the MER-YOLO pedestrian detection algorithm is improved by 4.5% on the Crowd Human dataset and 2.1% on the WiderPerson dataset, while only 3.1×106 parameters and 9.8 GFLOPs, which meet the deployment requirements of low computing power and high precision.

      A small object detection algorithm of remote sensing image based on improved Faster R-CNN
      HU Zhao-hua, WANG Chang-fu,
      2024, 46(06): 1063-1071. doi:
      Abstract ( 126 )   PDF (1062KB) ( 272 )     
      Object detection in remote sensing images is a critical issue in the field of object detection. Currently, most object detection models that using deep learning add attention mechanism during the unidirectional feature fusion process, enhancing various types of objects indiscriminately and failing to highlight small objects. In order to achieve better detection results, an asymmetric high and low-level modulation mechanism is introduced, constructing feature maps that consider shallow detail information and advanced semantic information with the aim of enhancing the characteristics of small objects. Additionally, the DIoU loss function is used instead of the original SmoothL1 loss function to improve model detection accuracy and convergence speed. Furthermore, flexible context information is introduced into in the region of interest classification task to improve the accuracy of small objects classification. Experiments demonstrate that the proposed method achieves good performance on DIOR and NWPU VHR-10 datasets. 

      Facial expression recognition based on network fusion to improve MobileViT
      DENG Xiang-yu, PEI Hao-yuan, SHENG Ying
      2024, 46(06): 1072-1080. doi:
      Abstract ( 107 )   PDF (1407KB) ( 200 )     
      From the perspective of lightweight models, a facial expression recognition network based on network fusion to improve MobileViT is proposed. This network integrates multi-scale convolution PSConv and attention mechanisms through residual structures to form the RAPsconv feature reconstruction module. This module can more efficiently extract multi-scale features from a fine-grained perspective, enhancing the expression of key features, thereby improving the network's expressive ability and constructing an end-to-end facial expression recognition network. Additionally, to further narrow the gap between similar expressions, a loss function combining Softmax Loss and Center Loss is proposed, effectively reducing the misjudgment rate of expression recognition. Experimental results demonstrate that the improved network achieves higher accuracy on three natural scene expression datasets FER2013, FER+, and RAF-DB compared to the base network MobileViT, with accuracy improvements of 1.73%, 2.18%, and 1.64%, respectively. The improved network has fewer parameters, stronger robustness, and is suitable for lightweighting and integration, making it suitable for real-world applications in facial expression recognition.


      Artificial Intelligence and Data Mining
      A population diversity-based robust policy generation method in adversarial game environments#br#
      ZHUANG Shu-xin, CHEN Yong-hong, HAO Yi-hang, WU Wei-wei, XU Xue-yong, WANG Wan-yuan
      2024, 46(06): 1081-1091. doi:
      Abstract ( 91 )   PDF (1201KB) ( 185 )     
      In adversarial game environments, the objective agent aims to generate robust game policies, ensuring high returns when facing different opponent policies consistently. Existing self-play-based policy generation methods often overfit to learning against a specific opponent policy, resulting in low robustness and vulnerability to attacks from other opponent policies. Additionally, existing methods that combine deep rein-forcement learning and game theory to iteratively generate opponent policies have low convergence efficiency in complex adversarial scenarios with large decision spaces. To address these challenges, a population diversity-based robust policy generation method is proposed. In this method, both adversaries maintain a policy population pool, ensuring diversity within the population to generate a robust target policy. To ensure population diversity, policy diversity is measured from two perspectives: behavioral and quality diversity. Behavioral diversity refers to the differences in state-action trajectories of different policies, while quality diversity refers to the differences in the returns obtained when facing the same opponent. Finally, the robustness of the policies generated based on population diversity is validated in typical adversarial environments with continuous stateaction spaces.


      Corrective-Net: A label association learning module for multi-label text classification
      XIAO Xin-zheng, HUANG Rui-zhang, CHEN Yan-ping, QIN Yong-bin, SONG Yu-mei, ZHOU Yu-lin,
      2024, 46(06): 1092-1100. doi:
      Abstract ( 101 )   PDF (1390KB) ( 263 )     
      In the current multi-label text classification tasks, the following two problems are mainly faced: (1) Emphasis is placed on the learning of text representation, and the modeling of the association information between labels is insufficient; (2) Although label association information is used to improve multi-label classification tasks, its modeling of label association relies too much on manually predefined external knowledge, and the acquisition cost of external knowledge is high, which limits its practical application. To solve the above problems, this paper proposes a label association learning module for multi-label text classification, called Corrective-Net. The module can automatically learn label association information in data without relying on external knowledge. At the same time, it can also use label association information to modify the initial prediction of the basic classification module, so that the final prediction takes into account semantic information and label association information, so as to obtain more accurate multi-label prediction. A large number of experiments on AAPD and SO data sets show the universality and effectiveness of Corrective Net. The effects of corrective label corrections on the performance of each label are analyzed. Explicit label association information is obtained and visualized.

      An EDAS decision making method and its application based on a novel picture fuzzy distance
      WANG Lei, LIU Ran-ran
      2024, 46(06): 1101-1111. doi:
      Abstract ( 75 )   PDF (576KB) ( 191 )     
      For multi-attribute decision-making problems with decision information as picture fuzzy, this paper firstly defines the assignment of the degree of refusal membership with parameters based on the limitations of existing picture fuzzy distances. It also proposes a picture fuzzy distance that reflects decision-makers’ risk preferences by combining consistency concepts. Through a numerical example, the new picture fuzzy distance is compared and analyzed with existing picture fuzzy distances to verify its superiority. Secondly, for attribute weights, this paper adopts the combination weighting method of game theory to combine the objective weights determined by entropy weighting with the subjective weights given by decision-makers. On this basis, the new picture fuzzy distance is extended to evaluation based on distance from average solution (EDAS) method, and the weighted sum of positive and negative distances between each scheme and the average scheme is calculated using the new distance, resulting in comprehensive scores. Finally, numerical examples are used to verify the applicability and effectiveness of the proposed decision-making method. Sensitivity and comparative analysis results show that decision-makers can adjust the parameter values according to their risk preferences to meet different decision-making needs. The method is more general and flexible compared to other existing decision-making methods, and the ranking results are more reasonable. 

      A financial implicit sentiment analysis model based on sentiment enhancement and semantic dependency
      ZHANG Yu-ying, ZHU Guang-li, TAN Guang-pu,
      2024, 46(06): 1112-1120. doi:
      Abstract ( 91 )   PDF (850KB) ( 195 )     
      Financial sentiment analysis is a technology to judge the sentiment orientation of financial texts, which is widely used in public opinion analysis and regulatory coordination. Because financial texts contain implicit sentiment information, it is difficult to directly determine the sentiment polarity according to sentiment features. To address this problem, a financial implicit sentiment analysis model based on sentiment enhancement and semantic dependency (FSED) is proposed to improve the accuracy of classification. Firstly, FinBERT is used to generate word vectors, which are then input into Bi-GRU to extract contextual semantic information. A dual-polarity attention mechanism is constructed by embedding positive and negative sentiment word vectors to extract sentiment feature vectors in two contexts. Then, based on semantic dependency graph of the text, dependency relationships and relationship type matrix are established. By combining these two matrices with the top-k strategy, a selection attention matrix is constructed. This matrix is then input into the graph convolutional network to extract semantic dependency features of the text. Finally, the features from sentiment enhancement and semantic dependency are fused, and compressed using average pooling and max pooling layers. After that, the features are input into fully connected layers and Softmax to obtain the classification results. Experimental results show that compared with A-GCN, FSED can improve the accuracy of implicit sentiment analysis in the financial field.
      Text error correction of Burmese speech recognition based on phoneme fusion
      CHEN Lu, DONG Ling, WANG Wen-jun, WANG Jian, YU Zheng-tao, GAO Sheng-xiang,
      2024, 46(06): 1121-1127. doi:
      Abstract ( 88 )   PDF (1286KB) ( 186 )     
      The Burmese language speech recognition text contains a large number of homophones and space errors. General methods use text semantic information to correct erroneous characters, but they are not accurate in locating and correcting Burmese space and homophone errors. Considering that Burmese is a tonal language with tone information embedded within its phonemes, this paper proposes a method for correcting errors in Burmese language speech recognition text that incorporates phonemes. Parameter sharing strategy is used to jointly model the transcribed texts and theirs phonemes, phoneme information is used to assist in detecting and correcting Burmese homophones and space errors. Experimental results show that compared with ConvSeq2Seq method, the F1 value of the proposed method in the Burmese speech recognition correction task has increased by 85.97%, reaching 79.15%.

      A heterogeneous guided whale optimization algorithm based on forward-reverse local exploitation and the golden sine algorithm
      XU Hui-ling, LIU Sheng, LI An-dong
      2024, 46(06): 1128-1140. doi:
      Abstract ( 76 )   PDF (1046KB) ( 200 )     
      The paper proposes a heterogeneous guided whale optimization algorithm (LEDGWOA) based on forward-reverse local exploitation and the golden sine algorithm to address the issues of low accuracy and poor stability in the Whale Optimization Algorithm (WOA). Firstly, the golden sine operator is embedded during the prey searching phase, enhancing the intensity of information exchange among individuals based on the principle of “better and closer.” Additionally, dominant whale groups are identified based on fitness values, and an adaptive inertial weight is calculated to determine a virtual leader. During the prey encircling phase, a bidirectional exploitation strategy incorporating Chebyshev threshold is integrated to strengthen neighborhood development intensity. Random spiral updates indirectly increase population diversity in later iterations. The improved algorithm is evaluated through simulation experiments on CEC2017 and CEC2019 functions and successfully applied to optimize the design of pressure vessels. LEDGWOA is compared against 17 other algorithms, demonstrating superior performance.