Computer Engineering & Science

Select

A data heterogeneity processing method based on asynchronous hierarchical federated learning

GUO Chang-hao, TANG Xiang-yun, WENG Yu

Computer Engineering & Science 2024, 46 (07): 1237-1244.

Abstract （88）

PDF （1083KB）（320）

In the era of ubiquitous Internet of Things devices, a vast amount of data with varying distributions and volumes is continuously generated, leading to pervasive data heterogeneity. Addressing the challenges of federated learning for intelligent devices in the IoT landscape, traditional synchronous federated learning mechanisms fall short in effectively tackling the NON-IID data distribution problem. Moreover, they are plagued by issues such as single-point failures and the complexity of maintaining a global clock. However, asynchronous mechanisms may introduce additional communication overhead and obsolescence due to NON-IID data distribution. To offer a more flexible solution to these chal- lenges, an asynchronous hierarchical federated learning method is proposed. Initially, the BIRCH algorithm is employed to analyze the data distribution across various IoT nodes, leading to the formation of clusters. Subsequently, data within these clusters is dissected and validated to identify nodes with high data quality. Nodes from high-quality clusters are then disaggregated and reorganized into lower-quality clusters, forming new, optimized clusters. Finally, a two-stage model training is conducted, involving both intra-cluster and global aggregation. Additionally, our proposed approach is evaluated using the MNIST dataset. The results show that, compared to the baseline set by the classical FedAVG method, the proposed approach achieves faster convergence on NON-IID datasets and improves model accuracy by more than 15%.

Reference | Related Articles | Metrics

Select

Research and application of whale optimization algorithm

WANG Ying-chao

Computer Engineering & Science 2024, 46 (05): 881-896.

Abstract （185）

PDF （901KB）（504）

The Whale Optimization Algorithm (WOA) is a novel swarm intelligence optimization algorithm that converges based on probability. It features simple and easily implementable algorithm principles, a small number of easily adjustable parameters, and a balance between global and local search control. This paper systematically analyzes the basic principles of WOA and factors influencing algorithm performance. It focuses on discussing the advantages and limitations of existing algorithm improvement strategies and hybrid strategies. Additionally, the paper elaborates on the applications and developments of WOA in support vector machines, artificial neural networks, combinatorial optimization, complex function optimization, and other areas. Finally, considering the characteristics of WOA and its research achievements in applications, the paper provides a prospective outlook on the research and development directions of WOA.

Reference | Related Articles | Metrics

Select

J4 2005, 27 (12): 68-71.

Abstract （653）

PDF （265KB）（4179）

This paper analyzes and summarizes the previous definition of feature selection,and then introduces a self-contained definition. It divides feature se lection into three classes according to the selecting strategy, and categorizes the methods into five styles by the evaluation function. Through analyzi ng the infection factors in the feature selection technology,this paper introduces some principles to pave the way for practitioners who search for suit able features to solve real-world applications.

Related Articles | Metrics

Select

Bi-YOLO:An improved lightweight object detection algorithm based on YOLOv8n

Computer Engineering & Science 2024, 46 (08): 1444-1454.

Abstract （139）

PDF （3044KB）（142）

The single-stage object detection technology represented by YOLOv8 has significant optimizations in the backbone network, but fails to efficiently integrate contextual information in the neck network, leading to missed and false detections in small object detection. Additionally, the large number of algorithm parameters and high computational complexity make it unsuitable for end-to-end industrial deployment. To address these issues, this paper introduce the BiFormer attention mechanism based on the Transformer structure to enhance the detection performance for small objects and improve the algorithms accuracy. At the same time introduce the GSConv module to reduce the algorithm size while ensuring no adverse impact on its performance, balancing the increase in computational and parametric costs brought by BiFormer. An object detection algorithm named Bi-YOLO is designed to achieve a balance between lightweight and algorithm performance. Experimental results show that compared to YOLOv8n, the Bi-YOLO object detection algorithm improves algorithm accuracy by 4.6%, increases the small object detection accuracy on the DOTA dataset by 2.3%, and reduces the number of parameters by 12.5%. Bi-YOLO effectively achieves a balance between algorithm lightweight and performance, providing a new approach for end-to-end industrial deployment.

Related Articles | Metrics

Select

A low-power keyword spotting system with SRAM buffer and computing-in-memory

HUANG Zhi-rui, JIA Xin-ru, , ZHU Hao-zhe, , CHEN Chi-xiao,

Computer Engineering & Science 2024, 46 (08): 1331-1339.

Abstract （91）

PDF （2185KB）（139）

This paper proposes a low-power keyword spotting (KWS) system to overcome the problem of high-power consumption caused by deploying KWS algorithms on edge computing hardware, which can significantly impact the endurance of mobile devices. The proposed KWS system is based on computing-in-memory (CIM) technology and software-hardware co-design. In terms of algorithm, a ternary quantized MFCC-CNN joint algorithm based on the standard MFCC algorithm topology is proposed. All the general matrix multiplication (GEMM) in MFCC is mapped to the neural network accelerator. At the circuit level, the proposed system uses a computing-in-memory (CIM) core based on SRAM to overcome the power and memory walls in traditional von Neumann architecture accelerators. Additionally, a SRAM buffer circuit based on a look-up table is proposed to replace the register delay chain, which multiplexes the memory array in the CIM core. Both the SRAM-based CIM core and buffer are implemented using custom circuit units. At the system level, a low-power KWS system is proposed utilizing the two customized circuits discussed above. The system is implemented using ASIC and customized circuit design methods and synthesized using a 28 nm process library. The proposed system achieves a processing delay of 64 ms on 10 classification tasks, with a total power consumption of 645.28 μW. The dynamic power consumption of the MFCC pipeline accounts for 5.9% of the total dynamic power consumption, and the total power consumption accounts for only 1.3% of the system's power consumption.

Related Articles | Metrics

Select

A review of named entity recognition research

DING Jian-ping, LI Wei-jun, LIU Xue-yang, CHEN Xu

Computer Engineering & Science 2024, 46 (07): 1296-1310.

Abstract （162）

PDF （946KB）（299）

Named entity recognition (NER), as a core task in natural language processing, finds extensive applications in information extraction, question answering systems, machine translation, and more. Firstly, descriptions and summaries are provided for rule-based, dictionary-based, and statistical machine learning methods. Subsequently, an overview of NER models based on deep learning, including supervised, distant supervision, and Transformer-based approaches, is presented. Particularly, recent advancements in Transformer architecture and its related models in the field of natural language processing are elucidated, such as Transformer-based masked language modeling and autoregressive language modeling, including BERT, T5, and GPT. Furthermore, brief discussions are conducted on data transfer learning and model transfer learning methods applied to NER. Finally, challenges faced by NER tasks and future development trends are summarized.

Reference | Related Articles | Metrics

Select

Optimization of sparse matrix-vector multiplication based on FPGA and row folding

ZHOU Zhi, GAO Jian-hua, JI Wei-xing

Computer Engineering & Science 2024, 46 (08): 1340-1348.

Abstract （58）

PDF （2277KB）（131）

Sparse matrix-vector multiplication (SpMV) is a key kernel in scientific and engineering computing. Due to the irregular data distribution in sparse matrices and the irregular memory access operations in SpMV calculations, the performance of SpMV on multicore CPUs and GPUs still lags significantly behind the theoretical peak performance of these devices. Existing CPUs and GPUs are limited in their architectures, making them unable to effectively utilize the special structure of sparse matrices to accelerate SpMV calculations. However, Field-Programmable gate arrays (FPGA) can achieve efficient parallel computing through customized circuits, which better handle the computation and storage issues of sparse matrices. An SpMV optimization method based on FPGA is proposed, which utilizes a high-level synthesis streaming processing engine and employs an adaptive multi-row folding SpMV optimization strategy. This method reduces the ineffective storage and computation of zero elements in the processing engine through row folding, thereby enhancing the performance of FPGA-based SpMV calculations. Experimental results show that compared to existing FPGA implementations, the proposed row folding-based dataflow engine achieves a maximum speedup of 1.78 times and an average speedup of 1.15 times.

Related Articles | Metrics

Select

A lightweight semantic segmentation based on attention mechanism

MA Dong-mei, WANG Peng-yu, GUO Zhi-hao

Computer Engineering & Science 2024, 46 (08): 1503-1512.

Abstract （69）

PDF （1024KB）（125）

Semantic segmentation is a computer vision technique that requires extracting focused information from a large number of images and then transforming this information into a clearer and easier- to-understand representation by means of a mask. Researchers are trying to find a balance in order to minimize the size of the model while ensuring its accuracy, which is currently a hot topic in designing lightweight network models. Currently, there are many challenges in image semantic segmentation techniques, such as segmentation discontinuity, incorrect segmentation, and high model complexity. To solve these problems, a lightweight semantic segmentation model based on attention mechanism is proposed. It uses freeze-thaw training, and the feature extraction network is MobileNetV2. To recover clearer target boundaries, a lightweight convolutional attention (CBAM) module is introduced in the output part of the atrous spatial pyramid pooling (ASPP) or channel attention (ECA-Net) in the decod- ing part. To solve the sample imbalance problem, the focal_loss loss function is introduced. Mixed accuracy is used, and the standard convolution in the output section is replaced with DO-Conv convolution. Experiments and validations are conducted on the PASCAL VOC2012 and Cityscapes datasets. The model size is 23.6 MB, with mean intersection over union (mIoU) scores of 73.91% and 74.89%, and class-wise pixel accuracy scores of 82.88% and 84.87% respectively. This successfully achieves a balance between accurate segmentation and computational efficiency.

Related Articles | Metrics

Select

An improved dense pedestrian detection algorithm based on YOLOv8: MER-YOLO

WANG Ze-yu, XU Hui-ying, ZHU Xin-zhong, LI Chen, LIU Zi-yang, WANG Zi-yi

Computer Engineering & Science 2024, 46 (06): 1050-1062.

Abstract （342）

PDF （3288KB）（423）

In large-scale crowded places, abnormal crowd gathering occurs from time to time, which brings certain challenges to the dense pedestrian detection technology involved in application scenarios such as autonomous driving and large-scale public place crowd monitoring systems. The new generation of dense pedestrian detection technology requires higher accuracy, smaller computing overhead, faster detection speed and more convenient deployment. In view of the above requirements, a lightweight dense pedestrian detection algorithm MER-YOLO based on YOLOv8 is proposed, which first uses MobileViT as the backbone network to improve the overall feature extraction ability of the model in pedestrian gathering areas. The EMA attention mechanism module is introduced to encode the global information, further aggregate pixel-level features through dimensional interaction, and strengthen the detection ability of small targets by combining the detection head with 160×160 scale. The use of Repulsion Loss as the bounding box loss function reduces the missed detection and misdetection of small target pedestrians under dense crowds. The experimental results show that compared with YOLOv8n, the mAP@0.5 of the MER-YOLO pedestrian detection algorithm is improved by 4.5% on the Crowd Human dataset and 2.1% on the WiderPerson dataset, while only 3.1×106 parameters and 9.8 GFLOPs, which meet the deployment requirements of low computing power and high precision.

Reference | Related Articles | Metrics

Select

Survey of multiobjective simulated
annealing algorithm and its applications

LI Jinzhong,XIA Jiewu,ZENG Xiaohui,ZENG Jintao,LIU Xinming,LENG Ming,SUN Li

J4 2013, 35 (8): 77-88.

Abstract （261）

PDF （620KB）（625）

Multi-Objective Simulated Annealing (MOSA) algorithm has been widely studied and applied to various fields successfully as a simple and effective multi-objective intelligence optimization algorithm. A systematic survey and discussion of the development of MOSA algorithm and its application in the recent twenty years are introduced. Firstly, the generic framework of MOSA algorithm is briefly described. Secondly, several typical MOSA algorithms are discussed, calculation methods of acceptance probability functions for those algorithms are emphatically addressed, and these algorithms are classified and analyzed. Thirdly, some typical applications of MOSA algorithms are introduced. Finally, some promising directions and challenges for future research in the area of MOSA algorithm are proposed according to the present studies. This paper can provide a comprehensive reference for future study of MOSA in algorithm improvement and its practical applications.

Reference | Related Articles | Metrics

Select

FDW-YOLO:An improved indoor pedestrian fall detection algorithm based on YOLOv8

CHEN Chen, XU Hui-ying, ZHU Xin-zhong, HUANG Xiao, SONG Jie, CAO Yu-qi, ZHOU Si-yu, SHENG Ke

Computer Engineering & Science 2024, 46 (08): 1455-1465.

Abstract （93）

PDF （1677KB）（122）

Aiming at the problem of low fall detection accuracy and poor real-time performance in indoor scenes due to the effects of light change, occlusion of the human body form, and changes in the human body posture under special viewpoint, a lightweight improved fall detection algorithm based on YOLOv8, named FDW-YOLO, is proposed. The C2f module in the backbone network is replaced by the FasterNext module, which reduces the computational complexity while retaining the excellent feature extraction capability. According to the characteristics of human falls with large changes in posture, three network structures with dynamically deformable convolutional modules added at different positions in the neck layer are designed, experiments are conducted on a self-made fall dataset for comparison, and ultimately, the YOLOv8-C scheme is selected based on network performance. A bounding box regression loss function WIoU is introduced into the improved network to replace the original CIoU. The experimental results show that compared with YOLOv8n, the FDW-YOLO fall detection algorithm increases mAP@0.5 from 96.5% to 97.9% and mAP@0.5:0.95 from 72.5% to 74.3%, while the number of parameters and computation is only 4.1×106 and 7.3×109, which is in line with the requirements for deployment in low-computing power industrial scenarios.

Related Articles | Metrics

Select

Survey on fuzzy testing technologies

NIU Sheng-jie, LI Peng, ZHANG Yu-jie,

Computer Engineering & Science 2022, 44 (12): 2173-2186.

Abstract （585）

PDF （884KB）（694）

As people pay more and more attention to software system security issues, fuzzy testing, as a security testing technology for security vulnerability detection, has become more and more widely used and more and more important due to its high degree of automation and low false alarm rate. After continuous improvement in recent years, fuzzy testing has achieved many achievements in both technical development and application innovation. Firstly, we briefly explain the related concepts and basic theories of fuzzing, summarize the application of fuzzy testing in various fields, and analyze the corresponding fuzzy testing solutions according to the needs of vulnerability mining in different fields. Then ,we focus on the important development results of fuzzy testing in recent years, including the improvement and innovation of testing tools, frameworks, systems, and methods. We also analyze and summarize the innovative methods and theories adopted by each development results, as well as the advantages and disadvantages of each tools and systems. Finally, from the perspectives of protocol reverse engineering application, cloud platform construction, emerging technology integration, fuzzy testing countermeasure technology research, and fuzzing tool integration, we provide direction reference for the further research of fuzzy testing.

Reference | Related Articles | Metrics

Select

A survey of precipitation nowcasting based on deep learning

MA Zhi-feng, ZHANG Hao, LIU Jie

Computer Engineering & Science 2023, 45 (10): 1731-1753.

Abstract （717）

PDF （1495KB）（703）

Precipitation nowcasting refers to the high-resolution prediction of precipitation in the short term, which is an important but difficult task. In the context of deep learning, it is viewed as a radar echo map-based spatiotemporal sequence prediction problem. Precipitation prediction is a complex self-supervised task. Since the motion always changes significantly in both spatial and temporal dimensions, it is difficult for ordinary models to cope with complex nonlinear spatiotemporal transformations, resulting in blurred predictions. Therefore, how to further improve the model prediction performance and reduce ambiguity is a key focus of research in this field. Currently, the research on precipitation nowcasting is still in the early stage, and there is a lack of systematic classification and discussion about the existing research work. Therefore, it is necessary to conduct a comprehensive investigation in this field. This paper comprehensively summarizes and analyzes the relevant knowledge in the field of precipitation nowcasting from different dimensions, and gives future research directions. The specific contents are as follows: (1) The significance of precipitation nowcasting, and the advantages and disadvantages of traditional forecasting models are clarified. (2) The mathematical definition of the nowcasting problem is given. (3) Common predictive models are comprehensively summarized, analyzed. (4) Several open source radar datasets in different countries and regions are introduced, and download links are given. (5) The metrics used for prediction quality assessment are briefly introduced. (6) The different loss functions used in different models is discussed. (7) The research direction of precipitation nowcasting in the future is pointed out.

Reference | Related Articles | Metrics

Select

A parallel fast neighbor searching algorithm for particle-based methods on CPU and GPU architectures in multi-scale simulation

DAI Chang-wei, KONG Rui-lin, JI Zhe,

Computer Engineering & Science 2024, 46 (08): 1349-1360.

Abstract （63）

PDF （2237KB）（105）

Particle-based methods are widely applied in the resolving of complex multi-scale physical phenomena in various science and engineering areas. In order to handle the challenge of increasing computational complexity and declining concurrency for the pair-wised particle searching procedure in massive multi-scale particle-based simulations, a new parallel fast neighbor searching algorithm, which features high-concurrency and low memory footprint, is developed and demonstrated on both many-core CPU and GPU architectures. An inter-level interaction strategy based on the concept of hierarchical nested data structure is proposed to resolve the issue of racing condition in cross-level particle search. An asymmetric mapping method is developed to eliminate the full mapping of particles on each level, which reduces the memory consumption. A set of numerical experiments show that, the proposed algorithm can handle multi-scale problems with particle volume ratio up to 108. Compared with traditional algorithm, the proposed algorithm can achieve 2x~8x speedups and lower memory consumption. The GPU-based implementation of the algorithm achieves state-of-the-art computational efficiency.

Related Articles | Metrics

Select

Computer Engineering & Science 2009, 31 (12): 58-61.

Abstract （54）

PDF （408KB）（513）

Related Articles | Metrics

Select

A multi-stage feature distillation-weighted lightweight image super-resolution network

YANG Sheng-rong, CHE Wen-gang, GAO Sheng-xiang, ZHAO Yun-lai

Computer Engineering & Science 2024, 46 (08): 1433-1443.

Abstract （41）

PDF （1178KB）（103）

To address the issues of insufficient receptive fields for extracting low-level features and the lack of reinforcement for local key features in lightweight networks, this paper proposed a multi-stage feature distillation-weighted lightweight image super-resolution network LMSWN. Firstly, a pyramid-like module is employed to expand the receptive field during shallow feature extraction, integrate feature information of different scales, and enrich the information flow of the network. Secondly, a multi-stage residual distillation-weighted module is designed to enhance the ability of square convolution to extract local key features, recover more detailed information, and improve reconstruction performance. At the same time, the combination of channel separation and 1×1 convolution realizes gradual distillation of features, reducing the number of network parameters. Finally, two adaptive parameters are introduced to jointly learn the features of the two branches of the multi-stage residual distillation-weighted module, enhancing the attention to different levels of feature information and further enhancing the representation ability of the network. Experimental results show that the proposed network is fully validated on five benchmark datasets: Set 5, Set 14, BSDS 100, Urban 100, and Manga 109, and its performance exceeds the current mainstream lightweight network.

Related Articles | Metrics

Select

A survey on deep learning based video anomaly detection

HE Ping, LI Gang, LI Hui-bin,

Computer Engineering & Science 2022, 44 (09): 1620-1629.

Abstract （483）

PDF （612KB）（528）

Recent years, with the widespread use of video surveillance technology, video anomaly detection, which can intelligently analyze massive videos and quickly discover the abnormalities, has received wide attention. This paper aims to give a comprehensive survey on deep learning based video anomaly detection methods. Firstly, a brief introduction of video anomaly detection is given, including the basic concepts, basic tasks, modeling process, learning paradigms as well as the evaluation perspectives. Secondly, the video anomaly detection methods are classified into four categories: reconstruction-based, prediction-based, classification-based, and regression-based. Their basic modeling ideas, typical algorithms, advantages, and disadvantages are discussed in detail. On this basis, the commonly used single-scene video anomaly detection public datasets and evaluation indicators are introduced, and the performance of representative anomaly detection algorithms is compared and analyzed. Finally, summary is conducted, and the future development directions related to datasets, algorithm and evaluation criteria of video anomaly detection are proposed.

Reference | Related Articles | Metrics

Select

J4 2008, 30 (2): 72-74.

Abstract （856）

PDF （229KB）（3775）

The traveling salesman problem （TSP） is a typical combination optimization problem, and possesses a practical application value. However, there is no effective corresponding solution to it today. So, in this paper, the traditionally affirmative methods and popular meta-heuristic methods are discussed. The advantages and disadvantages of each method are discussed. The future research direction of the TSP problem is also given.

Related Articles | Metrics

Select

A probabilistic forecasting method with fuzzy time series

DONG Wen-chao, GUO Qiang, ZHANG Cai-ming,

Computer Engineering & Science 2024, 46 (08): 1493-1502.

Abstract （45）

PDF （872KB）（98）

In time series prediction tasks, the uncertainty of historical observations poses difficulties in forecasting. However, the forecasting methods based on fuzzy time series have unique advantages in dealing with data uncertainty. Probabilistic forecasting, on the other hand, can provide the distribution of the predicted target and quantify the uncertainty of the prediction results. Therefore, a fuzzy time series probabilistic forecasting method based on a probability weighting strategy is proposed to reduce the impact of uncertainty on the forecasting task. The proposed method builds a probability-weighted fuzzy time series prediction model using historical observations of the target variable, and refines the fuzzy rule base of the prediction model by introducing additional observations. Specifically, two operators with low computational cost are used to reconstruct the fuzzy logic relationships. The intersection operator is used to exclude the interfering information, while the union operator merges all information, resulting in two different sets of fuzzy logic relationship groups. The relationship group corresponding to the current observation value in two sets is the prediction for the fuzzy set in the next moment. Finally, the probability distribution of the next moment is output by defuzzification. Experimental results on publicly available time series data sets verify the accuracy and validity of this method, and the prediction accuracy is remarkably improved in comparison to the newly proposed PWFTS prediction method.

Related Articles | Metrics

Select

Research on factors of heat dissipation of CPU chips in FCBGA package

CHEN Biao, CHEN Cai, ZHANG Kun, YE Qin

Computer Engineering & Science 2023, 45 (03): 406-410.

Abstract （258）

PDF （711KB）（252）

Thermal design is a very important part of chip packaging design, which directly affects the temperature and reliability of the chip during operation. The size parameters and physical properties of the packaging materials inside the chip have great influence on the heat dissipation of the chip. The thermal resistance of the chip or the junction temperature can be used to measure the heat dissipation performance. This paper studies the heat dissipation performance of some domestic FCBGA package by numerical simulation (Finite Volume Method), and analyzes the influence of factors such as material size, thermal conductivity of each layer in the CPU package and power density on the CPU temperature and thermal resistance. The research results show that, when the thermal conductivity of TIM1 is lower than 35 W/(m·K), the thermal conductivity and thickness of TIM1 have great influence on the heat dissipation of the CPU; the die area (power density) has great influence on the heat dissipation of the CPU, and the thickness of the die has little effect.

Reference | Related Articles | Metrics

Most Down Articles

Author center

Review center

Online journal