Computer Engineering & Science

An agile verification method of IO Die

LUO Li, SHI Wei, HE Hong-jun, PAN Guo-teng, WANG Lei, GONG Rui

2023, 45(04): 571-576. doi:

Abstract ( 300 )

PDF (990KB) ( 366 ) 　　

IO Die can be used as an IO extension chip, or a chiplet, which can be reused for multiple projects. This paper proposes an agile verification method of an IO Die. In order to reuse cross-level test cases, three levels (sub-system level, cluster level and whole chip level) are realized on the verification platform, and test cases at each verification level are optimized. Coverage drive, configurable constraint generation and multi-objective optimization methods are adopted respectively to improve the generation efficiency of test cases. Experimental results show that, this method can reduce the cost and time and effectively achieve Known Good Die (KGD) on the premise of ensuring efficiency and reliability.

A universal design on hardware acceleration of convolutional neural networks

WANG Yu-lei, XIE Kai-liang, CHEN Si-yun, HU Jie, CHANG Sheng

2023, 45(04): 577-581. doi:

Abstract ( 295 )

PDF (608KB) ( 498 ) 　　

With the rise of artificial Intelligence, neural network algorithms used in various scenarios are developing vigorously and ever-changing. This makes the general edge deployment acceleration design of various algorithms represented by convolutional neural networks a big problem. In view of this situation, based on the principle of data correlation and Roofline model, a general and universal design rule is proposed to design hardware-paralleled convolutional neural network. The three most important parts such as the convolution layer, the pooling layer and the full connection layer are optimized. Based on the optimized modules, various convolutional neural networks can be built according to the requirements of application scenarios, so as to achieve universal design. With LeNet-5 network as the verification object and MNIST test set as the benchmark, the verification was carried out on XILINX ZC702 and XILINX ZC706 FPGA platforms. The interactive recognition system constructed based on high-level synthesis after optimization of each layer achieves 95.09% accuracy and 4.1 ms/ sheet reasoning speed on XILINX ZC702 platform, and the same accuracy and 0.997 ms/sheet reasoning speed on XILINX ZC706 platform. Both have very high processing speed.

Extraction technology of random telegraph noise under complex multi-trap conditions

XIAO Yu, JI Zhi-gang

2023, 45(04): 582-589. doi:

Abstract ( 308 )

PDF (1504KB) ( 521 ) 　　

With the development of integrated circuits, the size of devices decreases, which leads to the increase of traps in MOSFET gate oxide layer. The generated low frequency noise, especially Random Telegraph Noise (RTN), becomes more and more obvious, which challenges the reliability of devices. When there are more than one trap in the device, the coupling effect between traps also has an important influence on the analysis of RTN signals. Therefore, it is particularly urgent to carry out research on RTN signal extraction technology in complex and multi-trap situations. At present, the existing RTN extraction technology consumes a lot of time when processing large amounts of data, and the degree of automation is not high. These problems need to be solved urgently. In this paper, based on the non-Gaussian property of RTN signal, an automatic detection method of RTN signal and an automatic detection method of trap number are proposed, which makes the extraction of RTN signal more accurate and efficient. In addition, this paper also proposes an adaptive model for important parameters in the iterative process, and realizes the iterative acceleration of most RTN signal extraction. Finally, the method is used to extract parameters of the measured RTN signal, and the influence of coupling effect on the parameters is analyzed.

A 128-core scalable architecture for Monte Carlo application

ZHANG Li, LI Tie-jun, ZHANG Jian-min

2023, 45(04): 590-598. doi:

Abstract ( 237 )

PDF (1343KB) ( 413 ) 　　

Monte Carlo method is an important method to study particle transport problems. Designing a customized acceleration architecture for Monte Carlo method has become a research hotspot in particle transport simulation. This paper profiles Quicksilver, a typical proxy application using Monte Carlo method, and makes an architectural exploration to the structural parameters that affect scalability, such as storage hierarchy and cluster size. Finally, a 128-core scalable architecture for Monte Carlo program is proposed, which achieves 90× speedup compared to a single core and a scaling efficiency of 70.1% under 128 cores.

A sparse interpolation algorithm based on modular arithmetic coefficient parsing

TANG Min, QI Niu-niu, DENG Guo-qiang

2023, 45(04): 599-606. doi:

Abstract ( 213 )

PDF (498KB) ( 367 ) 　　

Sparse multivariate polynomial interpolation is an effective strategy to reconstruct black box functions by using the sparse structure of polynomials and the given interpolation point information, which is widely used in science and engineering fields. The complexity of the traditional sparse interpolation algorithm based on Prony method is related to the number and degree of polynomial terms, and the efficiency is low when en-countering large-scale problems due to the execution of multiple high-order algebraic operations. Therefore, this paper proposes a sparse multivariate polynomial interpolation algorithm based on polynomial coefficient parsing. The core operation is to resolve the coefficients of univariate polynomials by using modular arithmetic, which avoids the solution of higher-order equations and finding the roots of higher order equation in the traditional method. In this method, black box polynomial is regarded as a univariate polynomial with one variable as the principal component, and the multivariate polynomial is recovered by parsing the values of the coefficient polynomials of the principal component at different interpolation point. Theoretical analysis and numerical experiments show that the algorithm is effective and feasible.

Experimental research on liquid cooling performance of a domestic dual-socket server

YE Qin, CHEN Cai, CHEN Biao, ZHANG Kun

2023, 45(04): 607-612. doi:

Abstract ( 182 )

PDF (1080KB) ( 416 ) 　　

With the continuous improvement of server integration and the gradual increase in power density, the problem of heat dissipation has become a major obstacle to the development of servers. Compared with air cooling, liquid cooling has good heat dissipation advantages and has important application prospects in the heat dissipation process of higher power density servers in the future. In addition, liquid cooling technology can effectively reduce the PUE value of the data center and reduce carbon emissions. Based on this, this paper studies the heat dissipation performance of a domestic dual-socket server through experiments. Compared with traditional air cooling, liquid cooling has better heat dissipation effect. Further, by comparing the internal/external circulation liquid cooling performance, it is found that the external circulation liquid cooling has greater heat dissipation regulation performance, wider temperature control range, and more suitable application scenarios. At lower inlet water temperature and larger inlet water flow, the external circulation liquid cooling can obtain better heat dissipation effect.

A time-triggered system modeling method based on Event-B

YAO Xi-xin, ZHANG Bo, CHEN Xiang-lan, QIAO Lei, LI Xi

2023, 45(04): 613-621. doi:

Abstract ( 193 )

PDF (774KB) ( 301 ) 　　

Modeling and verifying timing properties are essential for the safety-critical Cyber-Physical Systems (CPS). The model verification based on Event-B avoids the state space explosion problem of the model checking method based on state traversal, and the time consuming is less. Therefore, it is suitable for high-concurrency systems modeling. However, there is no time semantics in the common Event-B method, especially for time-triggered property that can greatly improve the predictability of systems. Firstly, based on Event-B, an abstract modeling framework for time-triggered systems (TTEB) is proposed, which models and refines timing properties for the behavior layer and the implementation layer. Secondly, the time-triggered properties of the behavior layer is modeled by a time- triggered transition composed of an ordered chain of events. The transition from the behavior model to the implementation model is realized by the refinement and decomposition of the global clock to the local clock. The periodic synchronization of distributed clocks in the implementation layer is modeled by the time-triggered transition. Finally, the modeling and verification of the master-slave car following system proves the usability and effectiveness of the method.

AUTOSAR operating system conformance test research based on decision table

CHEN Can, YANG Xing-da, FANG Ling

2023, 45(04): 622-629. doi:

Abstract ( 160 )

PDF (731KB) ( 299 ) 　　

The Automotive Open System Architecture (AUTOSAR) specification defines a series of abstract standard interfaces for in-vehicle embedded operating systems and related services, which have been widely used. The traditional AUTOSAR operating system conformance test method is not very specific and cannot test operating systems of different conformance class levels. The system under test needs to meet the highest conformance class level requirements to pass the test, so additional test cases that meet the requirements need to be extracted. This research designs a conformance test method for AUTOSAR operating system based on the decision table. The decision table is a symbolic means to express the logical interdependence of events. It can enumerate complex logical relations and multiple condition combinations in detail, and can be reproduced according to requirements. By referring to consis- tent classes when designing test cases, the test cases are given consistent class attributes when they are generated, and testing can be performed for different levels of operating systems. In actual experiments, this research method achieves targeted testing of different consistent operating systems, and carries out tests on four consistent types of five functional modules, avoiding the extra extraction of 379 test cases and improving test efficiency.

A method for determining weak components of component-based software system

WANG Yu-zhuo, LIU Hai-tao, YUAN Hao-jie, ZHANG Zhi-hua

2023, 45(04): 630-637. doi:

Abstract ( 168 )

PDF (604KB) ( 327 ) 　　

Component-based software system is a system whose core is structure design. Determin- ing the possible weak components of the system and eliminating the potential dangers in the design stage are of great significance to ensure the quality of the software system and reduce the waste and loss of resources caused by blind development. In this paper, two system parameters, total number of system faults and detection rate of system faults are defined for the software system whose component reliability follows the G-O model, and two system parameter estimation models based on the corresponding component parameters are established. On this basis, a method to determine the weak components of the system is given, and the effectiveness of the proposed method is verified by simulation. This method can prejudge the components that are most likely to be detected faults in the test or operation environment. Therefore, it has reference value for assisting software designers to determine weak components and optimize structure design.

Automatic code comment generation of Tree2Seq based on attention mechanism

ZHAO Le-le, ZHANG Li-ping, ZHAO Feng-rong

2023, 45(04): 638-645. doi:

Abstract ( 224 )

PDF (643KB) ( 406 ) 　　

Abstract:Code comments can help developers quickly understand code and reduce code maintenance costs. In order to preserve the structure information of the code, the classical Seq2Seq model will compress the structure information of the code into sequences, resulting in the loss of the structure information. A Tree-LSTM encoder is proposed to directly transform the code into an abstract syntax tree for encoding, so that the comments generation model can effectively obtain the structure information of the code and improve the effect of comments generation. The Tree2Seq model based on attention mechanism is adopted to realize the code comments generation task, which avoids the situation that the encoder compresses all input information into a fixed vector, resulting in partial information loss. The experiments are carried out on two programming language datasets, Java and Python. Three automatic evaluation indexes commonly used in machine translation are used for evaluation and verification, and some test data are selected for manual evaluation. Experimental results show that Tree2Seq model based on attention mechanism can provide more comprehensive and rich semantic structure information for decoder, and provide guidance for subsequent experimental analysis and improvement..

Improved InceptionV3 and transfer learning for solar panel defect recognition

SHI Ce, NAN Xin-yuan

2023, 45(04): 646-653. doi:

Abstract ( 157 )

PDF (1055KB) ( 266 ) 　　

In view of the low accuracy and slow speed of the traditional recognition methods for the surface defects of solar panels, this paper proposes a method based on improved InceptionV3 and transfer learning. Firstly, image preprocessing is carried out on the collected solar panels. Secondly, a new loss function is introduced to improve the InceptionV3 neural network by using the balance factor δ to ensure the recognition rate of the network. Finally, a defect recognition model is established with the transfer learning method to further improve the performance. The simulation results show that the method can effectively improve the defect recognition accuracy and speed of solar panels. The recognition accuracy is up to 96.43%, which is 2.45% higher than the traditional InceptionV3 model, and the average classification time is shortened by 4.5 ms. The experimental results show that this method has good effect and has great application prospect.

Mass detection of breast mammogram based on improved YOLOv4 model

BAI Yu-jie, PEI Yi-jian, ZHU Xiu-jun

2023, 45(04): 654-664. doi:

Abstract ( 265 )

PDF (1132KB) ( 382 ) 　　

Aiming at the problems such as few applications, low detection accuracy and slow detection speed of the mainstream object detection algorithms in the detection of benign and malignant masses in images of mammogram, a mass detection model of breast mammogram based on improved YOLOv4 model is proposed. This method can simultaneously detect and classify masses efficiently in a framework. Firstly, the detection model introduces a multi-channel JAnet residual structure to improve the backbone network of the model. Secondly, the depthwise separable convolution is introduced to replace the standard convolution in the original YOLOv4 model. Finally, a larger value averaging method is proposed in the post-processing stage. In the experiments, the DDSM (Digital Database for Screening Mammography) data set is used as the training set to train the detection model, and the INbreast data set is used as the independent test set. The experimental results show that, compared with the original YOLOv4 model, the proposal increases the Recall, mAP, FPS, and AUC by 7.3%, 6.45%, 5.9 fps and 13.02% respectively. The overall effect of the model is better than that of the current mainstream object detection model, showing good robustness and effectiveness. The model can play a role in computer-aided diagnosis in the clinical diagnosis of breast cancer by doctors.

Research on dynamic gesture recognition based on multimodal fusion

HU Zong-cheng, DUAN Xiao-wei, ZHOU Ya-tong, HE Hao

2023, 45(04): 665-673. doi:

Abstract ( 420 )

PDF (1081KB) ( 805 ) 　　

Aiming at the problems of low accuracy and weak robustness of dynamic gesture recognition in complex environment, a dynamic gesture recognition algorithm based on multimodal fusion, named TF-MG, is proposed. TF-MG combines the depth information and hand skeleton information, extracts the corresponding feature information using two different networks, and then fuses the extract- ed features into the classification network to realize dynamic gesture recognition. According to the depth information, the motion history image method is used to compress the motion trajectory into a single frame image, and the feature is extracted by MobileNetV2. According to the hand skeleton information, DeepGRU composed of gated recurrent units is used to extract features from the hand skeleton information. The experimental results show that, on DHG-14/28 dataset, the recognition accuracy of 14 kinds of hand gestures reaches 93.29%, and that of 28 kinds of hand gestures reaches 92.25%. Compared with other algorithms, it achieves higher recognition accuracy.

Real-time vehicle detection at intersections based on improved YOLOv5+DeepSort algorithm model

JIA Zhi, LI Mao-jun, LI Wan-ting

2023, 45(04): 674-682. doi:

Abstract ( 413 )

PDF (1449KB) ( 586 ) 　　

Aiming at the characteristics of low detection accuracy and poor robustness of traditional target detection and tracking algorithm, as well as the phenomenon of image and video resource redundancy and high vehicle density at the intersection, a real-time traffic flow detection method based on improved YOLOv5 and DeepSort algorithm model is proposed. This experiment uses a data set combin- ing MS COCO and BDD100k , and uses the improved YOLOv5 algorithm model to realize the small target vehicle detection in video. Then, the deep learning multi-target tracking algorithm (DeepSort algorithm) is used to carry out real-time tracking and counting of the detected vehicles, and the real-time traffic flow detection of the intersection monitoring end-to-end is realized. By analyzing and comparing models with different parameters, the YOLOv5m model is finally selected. Experimental results show that the proposed method has a faster detection speed and better detection effect for vehicles in complex environments, vehicle occlusion and high target density environments, with an average accuracy of 96.6%. This method can fully meet the requirements of real-time detection of targets, and fully meet the effectiveness of vehicle detection at intersections, and meet the actual requirements of use.

An image super-resolution reconstruction method based on multi-scale joint network

WANG Wan-jun, DING Xin-tao, LIU Chao, ZHANG Zhi-qiang

2023, 45(04): 683-690. doi:

Abstract ( 184 )

PDF (1329KB) ( 356 ) 　　

Super-resolution is a widely used technology in many applications, such as video repair. Aiming at the insufficiency of the Fast Super-Resolution Convolutional Neural Networks (FSRCNN) method, an image super-resolution reconstruction method based on multi-scale joint network is proposed. Firstly, based on multi-scale structures, a feature sampling model is proposed to extract the features of Low-Resolution (LR) image. Secondly, the features are enhanced by feature fusion and sub-pixel convolutional layer. Finally, a joint loss function involving Mean Square Error (MSE) loss and Peak Signal to Noise Ratio (PSNR) loss is proposed to improve the optimization of the networks training. Comparison experiments were carried out on the sets of Set5, Set14, and BSD100. The experimental results show that the method has superiority against the state-of-the-art methods. Finally, the proposed method is applied to increase the resolutions of the television dramas “Journey to the West” and “The Dream of Red Mansion”, which achieves good visual effect.

Long-term recommendation based on bipartite network

WANG Mei-shen, ZHANG Peng, XUE Le-yang,

2023, 45(04): 691-700. doi:

Abstract ( 127 )

PDF (1216KB) ( 289 ) 　　

Nowadays, most studies about recommender systems based on the bipartite network focus on the short-term performance of algorithms. However, in real life, recommendation for each user are a long-term process, and online networks evolve over time. Meanwhile, users tend to select novel goods when shopping. Therefore, it is necessary to pay more attention to the diversity of long-term recommendations. Aiming at the problem, the classical algorithm with good performance in short-term recommendations is applied to long-term recommendations and the diversity and accuracy of long-term recommendations are both gradually decreased. To improve the performance of long-term recommendations, a recommendation algorithm that incorporates the time factor is designed, and applied to the long-term recommendation. Experimental results show that the proposed algorithm significantly improves the long-term recommendation diversity without losing recommendation accuracy.

Customer satisfaction analysis based on fine-grained opinion mining and Kano model

ZENG Xiang-jun, YE Xiao-qing, LIU Dun

2023, 45(04): 701-710. doi:

Abstract ( 237 )

PDF (989KB) ( 411 ) 　　

Online reviews play an important role in customer relationship management, product marketing and other aspects. Effectively using online reviews to analyze user satisfaction is crucial for enterprises to improve their services and products. The variable design of traditional satisfaction analysis methods often relies on expert advice and seldom considers the asymmetric influence of positive and negative attributes. To solve these problems, this paper utilizes opinion mining technology to explore the features of customers online reviews and calculate services quality scores. Besides, PRCA technology is adopted to quantify the positive and negative influences of service attributes, and classify service attributes to Kano categories. Then, the characteristics of the different brand customer satisfaction under different granularity are analyzed, and the priority order of different customers' attributes is given. Finally, this paper mines five common attributes from coffee reviews. The experimental results show that different attributes have asymmetric effects on satisfaction and the influencing factors of customer satisfaction under different granularity have different characteristics. The corresponding refined enterprise management strategy is given.

Event extraction technology based on ALBERT pre-trained model

DU Jie, LUO Li-ming, SUN Zhong

2023, 45(04): 711-717. doi:

Abstract ( 202 )

PDF (646KB) ( 377 ) 　　

Information extraction technology is used to extract the information with high attention from unstructured text data. Event extraction technology is a challenging research direction in the field of information extraction. The purpose of event extraction is to extract key elements describing events from unstructured text data and present them in a structured way. Event extraction is regarded as a sequence annotation task. Firstly, the ALBERT pre-trained model is used to learn the features. Then, conditional random field is introduced to improve the sequence annotation performance. Finally, the identification and classification of event types and event elements are completed. The experimental results on ACE2005 standard corpus show that, compared with the existing models, ALBERT-CRF model improves the recall rate and F-score in trigger word recognition and classification tasks.

A strategy search method based on particle swarm optimization and deep reinforcement learning

PENG Kun-yan, YIN Xiang, LIU Xiao-zhu, LI Heng-yu

2023, 45(04): 718-725. doi:

Abstract ( 461 )

PDF (980KB) ( 544 ) 　　

Deep Reinforcement Learning (DRL) algorithm is a popular policy search method and has been successfully applied to a series of challenging control tasks. However, DRL is difficult to be applied to large-scale practical problems due to its difficulty in dealing with reward sparseness, lack of effective exploration and fragile convergence sensitive to hyperparameters. Particle Swarm Optimization (PSO) is an evolutionary optimization method, which uses the cumulative rewards of the entire episode as the fitness value and is insensitive to the environment with sparse rewards. Moreover, this method also has population-based diversification exploration and stable convergence, but the sample efficiency is low. In this paper, PSO and DRL based on policy gradient are combined. DRL trains the policies with the lowest cumulative rewards in the population through a variety of data provided by the PSO population, and every time the policies with improved cumulative rewards after training is inserted into the PSO population to enhance the information exchange between DRL and PSO population. This algorithm, called PSO-RL, can improve the sample efficiency of PSO and improve the performance and stability of DRL algorithm. Experiments on the challenging continuous control task of the PyBullet module show that PSO-RL performs better than both DRL and the evolutionary reinforcement learning algorithm.

A graph neural network recommendation model based on multi-task learning

LUO Ke-jin, LIU Guang-cong, YANG Wen-hao

2023, 45(04): 726-733. doi:

Abstract ( 293 )

PDF (673KB) ( 474 ) 　　

The powerful ability of graph neural network to process non-Euclidean spatial data has prompted more and more people to pay attention to its application in the recommendation field. However, most of the existing recommendation models based on graph neural networks still use several adjacency matrices to represent heterogeneous information such as all kind of nodes or edge attributes, and fail to make full use of the interaction of heterogeneous information. Therefore, this paper proposes a new graph neural network recommendation model, which models the rich interactions between all information entities as heterogeneous graph and uses the dense subgraph sampling strategy for sampling the subgraphs of heterogeneous graph. In addition, the multi-task learning method is added to the model to jointly optimize the link prediction and recommendation tasks, so that the model learns a better node representation and effectively improves the recommendation results. Experiments on two public datasets show that, compared with the baseline models, the proposed model improves the performance of the Top-N recommendation task.

Current Issue

Author center

Review center

Online journal