Computer Engineering & Science

Beacon+:A scalable lightweight end-to-end I/O performance monitoring, analysis and diagnosis

2022, 44(09): 1521-1531. doi:

Abstract ( 374 )

PDF (1331KB) ( 432 ) 　　

Abstract:With the barrier to exascale computing being broken, high performance computing has entered a new era. In order to meet the growing demand for data access, new technologies and storage media have been used in supercomputers, which makes its architecture increasingly complex and makes it difficult to locate abnormal performance and system hotspots. To this end, a scalable lightweight end-to-end I/O performance monitoring, analysis and diagnosis system for exascale supercomputers, Beacon+, is designed and implemented. It can monitor and analyze the data access process of each application in real-time without modifying the application code/script. Through online+offline compression methods and distributed caching/storage mechanisms, Beacon+ ensures that the system itself is highly scalable and low-cost, and can continuously and stably provide I/O diagnostic services. Using Sunway new-generation supercomputer as the deployment platform, we have proved Beacon+s low overhead, high accuracy and high efficiency of I/O diagnostics through I/O standard test applications and real-world applications.

SlurmX:A task scheduling system refactored from Slurm using object oriented methodology

WEN Rui-lin, FAN Chun, MA Yin-ping, WANG Zheng-dan, XIANG Guang-yu, FU Zhen-xin

2022, 44(09): 1532-1541. doi:

Abstract ( 331 )

PDF (1376KB) ( 393 ) 　　

At present, the widely used Slurm task scheduling system has the problems of bloated code, inefficient development of new functions and difficult maintenance. Based on the advantages and disadvantages of various currently mature task scheduling systems (such as Slurm and HTCondor), this paper designs a high-performance task and resource scheduling system SlurmX with excellent performance, excellent scalability and easy maintenance. This paper uses object-oriented methodology is used to refractor and reorganize the internal components of Slurm from top to bottom at functional levels, and discusses how to provides the high scalability of this system, and the low coupling between internal modules while ensuring the performance from the aspect of system architecture design and internal component design.

A Gatherv optimization method for large scale concurrency

SUN Hao-nan, WANG Fei, WEI Di, YIN Wan-wang, SHI Jun-da

2022, 44(09): 1542-1549. doi:

Abstract ( 177 )

PDF (1459KB) ( 306 ) 　　

As an irregular MPI (Message Passing Interface) collective communication, Gatherv provides great flexibility for the description of parallel communication behavior, but its irregularity brings high implementation difficulties. Existing methods have some problems, such as outstanding communication hotspots, high memory overhead, low memory access efficiency, etc., which are difficult to satisfy the performance requirements of todays large-scale parallel applications. A Gatherv optimization method for large scale concurrency is proposed. Starting from the optimization level, buffer management and other key issues, the binomial tree model commonly used in the implementation of regular collective communication is applied to the implementation of Gatherv. Besides, a message chain scheduling is proposed to further reduce the overhead and improve the optimization effect. Test data shows that the proposed method can effectively solve the performance problems of the existing methods, and achieve efficient scalability of Gatherv performance under the condition of large-scale concurrency.

Performance evaluation of Intel persistent memory for CFD applications

WEN Min-hua, CHEN Jiang, HU Guang-chao, WEI Jian-wen, WANG Yi-chao, LIN Xin-hua

2022, 44(09): 1550-1556. doi:

Abstract ( 242 )

PDF (685KB) ( 283 ) 　　

In the field of scientific computing, the amount of data is increasing rapidly with the increase in the accuracy of numerical simulations. The traditional memory solution based on DRAM is difficult to expand the capacity due to the high cost. In recent years, the persistent memory technology has attracted more and more attentions and is expected to solve this problem. Persistent memory is a supplement between DRAM and SSD. Compared with DRAM, persistent memory has the advantages of large capacity and high cost performance, but its performance is relatively lower. To test the application performance of persistent memory, we evaluate the performance of Intel persistent memory for Computational Fluid Dynamics (CFD), an important area of scientific computing. In the experiment, persistent memory adopts the most easy-to-use memory mode, the source code does not need any modification, and the test program covers memory benchmark test and 3 common CFD algorithms. The experimental results show that in the memory mode, for different CFD algorithms, compared with DRAM as main memory, the introduction of persistent memory brings some performance loss, which increases with the increase of the data size. On the other hand, the deployment of persistent memory enables a single server to support numerical simulations with extremely large size of data.

Exploring Windows application running environment in Linux-based supercomputing systems

XU Hai-kun, XIE Yi-man, WU Qing, CHEN Jun, ZOU You

2022, 44(09): 1557-1562. doi:

Abstract ( 239 )

PDF (643KB) ( 321 ) 　　

The operating systems of most supercomputers are based on Linux operating system, which limits the use of applications based on Windows operating system. In addition, the high demand of operating the supercomputing system also discourages users who are not familiar with Linux, result- ing in the loss of supercomputing center users. We explore the ways of using Windows applications to maintain the convenience of supercomputing system operation and maintenance management in the Linux environment of supercomputing systems. Through X11 forwarding, Wine, virtualization and other technologies, we provide users with a Windows program running environment, which is compatible with the supercomputer job scheduling system. At the same time, we provide users with secure and stable file access methods. The configuration methods and examples used in this paper can provide solutions for the supercomputing center with similar requirements, so as to broaden the scope of user software application and improve user satisfaction.

Research on task unloading for marine energy-saving edge computing

JIANG Xin-xiu, CHANG Jun, LI Bo, YANG Zhi-jun, DING Hong-wei

2022, 44(09): 1563-1573. doi:

Abstract ( 234 )

PDF (1279KB) ( 365 ) 　　

Aiming at the problems of unstable energy and long time delay in marine communication network, a hybrid energy supply edge computing offload scheme is proposed. For energy supply, mobile edge computing (MEC) server integrates hybrid power supply and hybrid access point. Hybrid power supply uses renewable energy to supply energy for MEC server, and power grid is used to supplement energy to ensure the reliable operation of the system. Vessel users collect energy through radio frequency (RF) signal broadcast by hybrid access point. Aiming at the optimization problem of task offloading and taking energy delay trade-off optimization as the objective, the optimization scheme of task offloading ratio, local computing power and transmit power is formulated with the energy collection method. Finally, the dimension reduction optimization algorithm is adopted to simplify the objective function to a one-dimensional multi-constraint problem about task offloading ratio, and the whale optimization algorithm is improved to obtain the optimal total execution cost. The simulation results of EdgeCloudSim show that the proposed scheme reduces the execution cost by 13.4% and 9.6% compared with the scheme supporting energy collection and the basic offshore communication network optimization scheme.

An image encryption algorithm based on compressed sensing and DNA coding

DENG Wen-bo, LIU Shuai, LIU Fu-cai, HUANG Ru-nan

2022, 44(09): 1574-1582. doi:

Abstract ( 202 )

PDF (1638KB) ( 456 ) 　　

Aiming at the problems of poor security and low transmission efficiency of traditional image encryption algorithms, an image compression encryption algorithm based on compressed sensing (CS) and DNA coding is proposed. Firstly, CS is used to preprocess the encrypted image. In the preprocessing process, Kronecker product (KP) is used to construct the measurement matrix and reduce the plaintext image size proportionally. Then, the chaotic sequence generated by the hyperchaotic Bao system is used to dynamically control the DNA encoding, decoding and operation mode to encrypt and decrypt the compressed image. Finally, the reconstructed image is obtained by the reconstruction algorithm. The algorithm makes full use of the chaotic sequence generated by the hyperchaotic Bao system. DNA diffusion is performed on the original image by integrating the generated chaotic sequence. Simulation experiments and analysis show that the algorithm can effectively improve the efficiency and security of image transmission.

A rectangular image scrambling method based on the square theorem and magic square

LI Xiang-jun, YU Peng, LIU Bo-cheng, YUAN Ling-li

2022, 44(09): 1583-1593. doi:

Abstract ( 153 )

PDF (2272KB) ( 348 ) 　　

Aiming at the problem that the magic square scrambling algorithm has insufficient variable-scale scrambling ability, poor scrambling effect and low security, based on the four-square theorem and the magic square scrambling method, a four-block magic square scrambling algorithm (FMSS) is proposed. Firstly, according to the four-block rule of the Four Square Theorem, the plain image is divided into four square image blocks. Secondly, the magic square matrix required for scrambling is generated. Thirdly, each image block is scrambled. Stitching, transpose, shape change and other operations are used to makes all pixels fully diffused and restored to the size of the plaintext image. Finally, the ciphertext image is obtained through multiple scrambling methods. Experimental results show that the algorithm can effectively enhance the image scrambling effect and can scramble and restore the rectangular image. It effectively reduces the image pixel correlation and has better security. The algorithm satisfies the requirements of the image scrambling algorithm in terms of robustness, scrambling and recovery speed.

An encryption and decryption outsourcing solution supporting attribute updates in a smart medical environment

MA Jia-jia, CAO Su-zhen, DOU Feng-ge, DING Xiao-hui, DING Bin-bin, WANG Cai-fen,

2022, 44(09): 1594-1601. doi:

Abstract ( 127 )

PDF (752KB) ( 264 ) 　　

In order to realize the sharing of patient data and to query patient medical records more conveniently, electronic medical records have become indispensable data in the medical system. In order to solve the problem of fine-grained access control between doctors and patients and patient privacy protection, an encryption and decryption outsourcing solution supporting attribute update in a smart medical environment is proposed. By outsourcing encryption and decryption operations to fog nodes, the solution effectively reduces the computational burden of data owners and data users and improves efficiency. In addition, the program supports the updating of patients’ medical records, which is more in line with actual application conditions. At the same time, when the ciphertext is updated, the authorized institution transmits the hash value related to the attribute to the medical cloud server, which effectively protects the user privacy. Finally, the solution is proved to be safe based on the difficult problem of DBDH, and the analysis of experimental data shows that the solution has certain advantages in computational efficiency.

An improved VIRE location algorithm for the whole region

NIU Kun, GAO Zhong-he, ZHANG Fan

2022, 44(09): 1602-1609. doi:

Abstract ( 197 )

PDF (1095KB) ( 312 ) 　　

The virtual reference tags of the VIRE positioning algorithm are distributed in the central area, resulting in low positioning accuracy in non-central areas, and the algorithm needs to repeatedly adjust the threshold according to the environment, which increases the difficulty of the experiment. Aiming at the above problems, an improved VIRE algorithm for the whole region is proposed on the basis of the VIRE algorithm. Firstly, the virtual reference label is arranged in the overall positioning area. Meanwhile, the RSSI value of the virtual reference label is estimated by Newton interpolation. Then, the adjacent reference label is selected through the dynamic threshold, and the final adjacent reference label is determined after the reliability of the adjacent reference label. Finally, error correction is performed to obtain the coordinates of the positioning label. The simulation results show that the improved algorithm not only greatly improves the positioning accuracy, but also has strong adaptability to the positioning environment.

A data sharing scheme for encrypted electronic health record

NIU Shu-fen, YU Fei, CHEN Li-xia, WANG Cai-fen

2022, 44(09): 1610-1619. doi:

Abstract ( 324 )

PDF (1079KB) ( 371 ) 　　

In order to realize fine-grained access control, secure storage and sharing of electronic health record data, this paper proposes a cloud-chain collaborative storage electronic health record sharing scheme based on attribute encryption. In the scheme, the symmetric encryption algorithm is used to encrypt the electronic health record, the ciphertext-policy attribute-based encryption is used to encrypt the symmetric key, and the searchable encryption algorithm is used to encrypt the key words. Electronic health record ciphertext is stored on the medical cloud, and secure index is stored on the consortium chain. The secure search of keywords is realized by the searchable encryption technology and the user attribute is relocated by proxy re-encryption technology. It is proved that the scheme can achieve the security of ciphertext and keyword. The numerical simulation results show that the scheme is effective.

A survey on deep learning based video anomaly detection

HE Ping, LI Gang, LI Hui-bin,

2022, 44(09): 1620-1629. doi:

Abstract ( 720 )

PDF (612KB) ( 1078 ) 　　

Recent years, with the widespread use of video surveillance technology, video anomaly detection, which can intelligently analyze massive videos and quickly discover the abnormalities, has received wide attention. This paper aims to give a comprehensive survey on deep learning based video anomaly detection methods. Firstly, a brief introduction of video anomaly detection is given, including the basic concepts, basic tasks, modeling process, learning paradigms as well as the evaluation perspectives. Secondly, the video anomaly detection methods are classified into four categories: reconstruction-based, prediction-based, classification-based, and regression-based. Their basic modeling ideas, typical algorithms, advantages, and disadvantages are discussed in detail. On this basis, the commonly used single-scene video anomaly detection public datasets and evaluation indicators are introduced, and the performance of representative anomaly detection algorithms is compared and analyzed. Finally, summary is conducted, and the future development directions related to datasets, algorithm and evaluation criteria of video anomaly detection are proposed.

A dehazing algorithm based on transform domain and adaptive gamma correction

WANG Rong, YANG Yan

2022, 44(09): 1630-1637. doi:

Abstract ( 213 )

PDF (2017KB) ( 584 ) 　　

Aiming at the problems of halo effect and color cast of sky area in dark channel prior algorithm, an image dehazing algorithm based on transform domain and adaptive gamma correction is proposed. By transforming atmospheric scattering model to logarithmic domain, combined with the dark channel prior theory, a positive correlation in logarithmic domain is proposed. Then the Gaussian function is used to fit positive correlation to obtain the coarse transmission. At the same time, the hazy image is converted to HSV color space, the brightness component is extracted to construct an adaptive gamma correction factor, the coarse transmission is corrected, and cross bilateral filtering operation is used to further optimize the transmission. Finally, the restoration of the haze-free image is realized by the atmospheric scattering model and the improved local atmospheric light. Experimental results show that the recovered image has rich details and thorough degree of dehazing in comparison to some classic algorithms. Moreover, it is closer to the real scene because of the better color fidelity.

LUDet：A lightweight underwater object detector

YU Ming-hao, GAO Jian-ling

2022, 44(09): 1638-1645. doi:

Abstract ( 181 )

PDF (1326KB) ( 324 ) 　　

Aiming at the problem that the traditional underwater object detector is greatly affected by the environment, a new lightweight network, called LUNet, is proposed to extract features, and a lightweight detector, called LUDet, is proposed by combining the two-stage detection algorithm. Firstly, in the first stage of the backbone network, efficient convolution pooling is used to obtain different feature expressions. Secondly, two-way dense connections are proposed on the basis of dense connection structure to improve the network representation ability. The network is composed of convolution pool layer and two dense connection structures. GhostModel is used to replace the 1×1 point convolution in the network. The classification experiments on CAFIR10 and CAFIR100 datasets show the effectiveness of the proposed backbone network. For the detection task, LUDet detects the target through feature maps obtained after channel attention and multi-stage fusion. The improved detection algorithm is validated using two underwater datasets. The mAP of the u nderwater biological dataset reaches 52.5%, and the mAP of the underwater garbage dataset reaches 58.7%.

An automatic cardiac magnetic resonance image segmentation algorithm based on deep learning

LIU Cong-jun, XU Jia-chen, XIAO Zhi-yong, CHAI Zhi-lei

2022, 44(09): 1646-1654. doi:

Abstract ( 388 )

PDF (1096KB) ( 433 ) 　　

Because of its advantages of no ionizing radiation, Cardiac Cine Magnetic Resonance Imag- ing (CMRI) has become the main method in medical diagnosis, and the accurate classification and re- cognition of left ventricular, right ventricular and left myocardium is an important step before the heart surgery. However, manual segmentation of cardiac structure time-consuming and error-prone, so automatic segmentation of both ventricular and myocardium is crucial. Firstly, U-Net++ is selected as the basic network framework. Secondly, in order to improve the feature reuse rate and solve the overfitting issue caused by the increase of network depth, a dense residual module is proposed in the encoding part of U-Net++, so that more features can be learned during the network down-sampling process. In addition, in the decoding part, in order to make the network segmentation results more in line with the target organ between the physical characteristics, multiple convolution kernels are used to expand the receptive field and a long distance dependent module is used share the global context information to make the network in the decoding process as much as possible to get the relationship information between the target organs and make the segmentation result more accurate. Finally, considering the consistency and uniqueness between the biventricle and the left myocardium, the post-treatment operation of obtaining the maximum connectivity domain and filling the small holes is added. The experimental data used in this paper is the ACDC Cardiac Segmentation Challenge data set, which includes SHORT axial MRI images of 150 volunteers at the end of systolic and diastolic phases. In this paper, the test set of this data set is verified, and the experimental results are obtained by online submission. Experimental results show that, compared with other methods, the proposed method can effectively segment the target organ. Specially, the Dice coefficient at the end of diastolic period reaches 0.96, 0.94 and 0.89 respectively in the left ventricle, right ventricle and left myocardium, and the segmentation precision at the end of systolic period reaches 0.87, 0.86 and 0.89 respectively.

A lightweight text recognition algorithm for Chinese guide signs

YI Chao-jie, CHEN Li, BAO Yu-xiang

2022, 44(09): 1655-1664. doi:

Abstract ( 173 )

PDF (1478KB) ( 329 ) 　　

Aiming at the difficulty of multi-directional and multi-angle text extraction and recognition in Chinese traffic guidance signs, a light-weight Chinese traffic guidance sign text extraction and recognition algorithm is proposed that integrates convolutional neural networks and traditional machine learning methods. Firstly, the YOLOv5l object detection network is lightly improved, and the YOLOv5t network is proposed to extract the text regions in the road signs. Secondly, an M-split algorithm combining the projection histogram method and the polynomial fitting method is proposed to segment the extracted text regions. Finally, the MobileNetV3 lightweight network is used to recognize the text. The proposed algorithm achieves a close-shot text recognition accuracy of 90.1% on the self-made TS-Detect dataset, the detection speed achieves 40 fps, and the size of the weight file is only 24.45 MB. The experimental results show that the algorithm is lightweight and accurate enough to complete the real-time Chinese guide sign text extraction and recognition tasks under complex shooting conditions.

Accurate location of unconstrained license plate based on cascaded CNNs

XU Guang-zhu, KUANG Wan, WAN Qiu-bo, LEI Bang-jun, WU Zheng-ping, MA Guo-liang

2022, 44(09): 1665-1675. doi:

Abstract ( 159 )

PDF (1518KB) ( 324 ) 　　

To solve the problem that the rectangular detection box outputted from a single deep convolutional neural network (CNN) cannot deal with the non-front license plate location well under unconstrained scenarios, a solution based on cascaded CNNs is proposed. The solution cascades the target detection and target classification CNN network, uses the detection network to obtain the region of interest, and then uses the lightweight classification network to transform the license plate vertex detection problem into a regression problem. Firstly, the YOLOv3 network is used for locating license plates roughly to obtain all the license plate candidate areas in an image. Then, an improved MobileNetV3 lightweight CNN is adopted to locate the license plate vertices in the candidate areas to achieve the precise location of license plate areas. Finally, the license plate is corrected and projected into the rectangular box through perspective transformation. Experiment results show that the proposed cascaded CNNs can effectively solve the problem that a single CNN object detection network can only output a rectangular detection box and is not suitable for unconstrained license plate detection. It has a good application potential.

An ultrasonic neural segmentation algorithm based on U-Net improved multi-scale fusion

ZHANG Ke-shuang, WU Chun-xue, ZHANG Sheng, LIN Xiao

2022, 44(09): 1676-1685. doi:

Abstract ( 167 )

PDF (1157KB) ( 359 ) 　　

Traditional cervical ultrasound nerve detection algorithms have low detection sensitivity, a large number of false positives, and insufficient utilization of low-level features. However, the number of ultrasound images of the neck is small, and the edges are blurred and sensitive to noise. Therefore, an improved U-Net branch fusion algorithm is proposed. It improves the loss function to obtain high-quality candidate samples, replaces the ordinary convolutional layer in the original structure with a multi-scale convolution structure to enhance feature extraction, and combines expanded convolution to replace middle and deep pooling operations so as to improve the utilization of low-level features. The performance of the proposed algorithm is verified through comparative experiments. The experimental results show that, compared with the traditional U-Net and SegNet convolution networks, the proposal improves the small-size ultrasonic neural segmentation effect by nearly 9% and 17% respectively, and the segmentation accuracy is higher for normal-size and small-size neural segmentation.

A core concept extracting method based on fuzzy Bayesian decision-making

ZHONG Han, XU Yi-jia, LU Hao, SUN Jing-rui

2022, 44(09): 1686-1692. doi:

Abstract ( 165 )

PDF (629KB) ( 381 ) 　　

In order to improve the efficiency of concept extraction in the field, a core concept extraction method based on fuzzy Bayesian decision-making is proposed. Firstly, after randomly extracting a large amount of text and sorting the text vocabulary, candidate concepts are obtained. Secondly, the characteristic values of the candidate concepts are calculated by the traditional TF-IDF algorithm, and normalized by the conceptual membership. Finally, the probability that the candidate concepts are the core concepts is calculated by Bayesian decision-making. The extraction experiment of the core concept of financial text shows that the average accuracy of core concept extraction is much higher than that of the traditional TextRank, LDA, word2vec, RNN and LSTM. Comprehensive experimental results show that the core concept extraction method based on fuzzy Bayesian decision-making performs better in core concept extraction.

Knowledge tracing based on contextualized representation

WANG Wen-tao, MA Hui-fang, SHU Yue-yu, HE Xiang-chun

2022, 44(09): 1693-1701. doi:

Abstract ( 151 )

PDF (856KB) ( 364 ) 　　

Knowledge tracing (KT) is a very important problem in the field of educational data mining. It aims to use the observable historical interaction data of students and the knowledge concepts (KCs) in the exercises to infer the students’ knowledge states (KS). Although existing efforts have yielded immense success on this task, most of them ignore the importance of using knowledge points to represent exercises, and the research on using contexts such as learning factors to represent knowledge points is not enough. Aiming at the above issue, a Knowledge Tracking method based on Contextualized Representations (KTCR) is proposed. Specifically, firstly, considering the complex contexts of students' learning process, a contextualized representation method based on response logs for KCs is devised to generate contextualized Q-matrix. Moreover, contextualized KCs and response logs are leveraged to re-represent vectors of exercises. Finally, Long Short-Term Memory network (LSTM) is adopted to estimate KS vectors of all students with the help of historical interaction data. Experiments on four real-world datasets demonstrate the rationality of the proposed method for the embedded representation of exercises, and can effectively estimate the knowledge state of students.

Current Issue

Author center

Review center

Online journal