Computer Engineering & Science

Comparative study of big data active learning

based on MapReduce and Spark

ZHAI Jun-hai1,2，QI Jia-xing1,2，SHEN Chu1,2，SONG Dan-dan1,2，WANG Mo-han1,2，TIAN Shi1,2

2019, 41(10): 1715-1722. doi:

Abstract ( 172 )

PDF (925KB) ( 331 ) 　　

Review attachment

In our previous work, a big data active learning algorithm based on MapReduce was proposed. In this paper, we transplant this algorithm into the Spark environment and propose a Spark based big data active learning algorithm. Furthermore, the two algorithms are experimentally compared on four aspects: running time, number of files, number of synchronizations, and memory cost. Some valuable conclusions are obtained,which can be very helpful to researchers in the related fields.

A parallelhigh utility itemset mining

algorithm based on Spark

HE Deng-ping1,2,3，HE Zong-hao1,2，LI Pei-qiang1,2

2019, 41(10): 1723-1730. doi:

Abstract ( 197 )

PDF (654KB) ( 350 ) 　　

Review attachment

Aiming at the problem that the traditional Top-K high utility mining algorithms based linked list structure can not meet the mining requirements in the big data environment, a parallel high utility itemset mining algorithm based on Spark (STKO) is proposed. Firstly, the TKO algorithm is improved by increasing the threshold increase and reducing the search space. Then, based on the Spark platform, the original data storage structure is changed and broadcast variables are used to optimize the iterative process,so as to avoid a large number of recalculations and use the load balancing idea to realize parallel mining of Top-K high utility itemsets. The experimental results show that the proposed algorithm can effectively mine the high utility item sets in the big data sets.

A multi-core DSP development

platform based on Ethernet and PCIe

ZHANG Xiang-yu,SHI Hui-li

2019, 41(10): 1731-1737. doi:

Abstract ( 162 )

PDF (957KB) ( 302 ) 　　

Review attachment

Multi-core Digital Signal Processor (DSP) is widely used in signal processing systems in aviation, aerospace and other fields. In practical engineering applications, due to the performance limitation of JTAG interface, there are some problems such as slow speed, instability and difficult operation, which lead to low development efficiency and seriously affect the progress of the project. This paper proposes a multi-core DSP software and hardware development platform with Gigabit Ethernet and PCIe as loading and debugging interfaces, including standardized hardware modules and software development environment. It has the advantages of easy reconfiguration, easy expansion, no longer relying on JTAG interface, compatibility with multiple software operating systems, and low resource occupation.Taking the multi-core DSP TMS320C 6678 as an example, this paper describes the key techniques in the development of the signal processing system composed of multi-chip DSPs, including COTS module, system architecture, hardware diagnoising, software loading, software debugging and so on. This platform can significantly reduce the application threshold and greatly improve the development efficiency.

An essential proteins prediction algorithm based on

participation degree in protein complex and density

MAO Yi-min，LIU Yin-ping

2019, 41(10): 1738-1748. doi:

Abstract ( 162 )

PDF (1090KB) ( 293 ) 　　

Review attachment

The identification of essential proteins in the protein-protein interaction (PPI) network tends to only focus on the topological characteristics of the nodes, and the PPI data contains high false positive, the neighborhood information of nodes and the influence of complex mining on the recognition of essential proteins are not considered comprehensively by the essential proteins recognition algorithm based on complex information, so the accuracy and specificity of the recognition results are not high. In order to deal with these problems, an essential proteins prediction algorithm based on participation degree in protein complex and density (PEC) is proposed. Firstly, the GO annotation information and the edge aggregation coefficient are used to construct the weighted PPI network to overcome the influence of false positives on the experimental results. Based on the edge weight of protein interaction, the similarity matrix is constructed. The maximum difference between eigenvectors is designed to automatically determine the partition number K. Meanwhile, K initial clustering centers are selected according to the degree of protein nodes in the weighted network. Furthermore, the spectral clustering and the fuzzy C-means (FCM) clustering algorithm are combined to excavate the protein complex, thus improving the clustering accuracy and reduces the data dimension. Secondly, based on the degree of participation in protein complex and the neighborhood subgraph density, the scores of the essential proteins are proposed. The experiment results on DIP and Krogan datasets show that, compared with 10 classic algorithms such as DC, BC, CC, SC, IC, PeC, WDC, LIDC, LBCC and UC, PEC can correctly identify more essential proteins with higher accuracy and specificity.

A memory built-in self-repair method for SoC design

QIN Pan1,WANG Jian2,ZHU Fang1,JIAO Gui-zhong1

2019, 41(10): 1749-1754. doi:

Abstract ( 196 )

PDF (674KB) ( 327 ) 　　

Review attachment

Built-in self-test and self-repair of embbeded memory is an effective method to improve the System-on-Chip (SoC) yield. The memory yield evaluation method is described in detail. A memory repair structure based on Tessent tool of Mentor corporation is proposed. This structure uses the redundant repair method and the efuse-based hard repair method. It has been applied to practical projects many times.

Correlation measurement of campus wireless network

users based on the shortest time distance

LI Xin-jian，LIU Man-dan

2019, 41(10): 1755-1762. doi:

Abstract ( 144 )

PDF (770KB) ( 270 ) 　　

In campus networks,there are a large number of information systems that record the users’ daily behavior. By analyzing the daily trajectory information of a large number of users, we can find the behavioral correlation among users, and measure the strength of social relationship among users. Based on the data characteristics of the campus network systemin a Shanghai college, we propose an improved method based on user time series model, which utilizes the shortest time distance to measure the social relationship among users. This method firstly uses the users’ behavioral data to generate the time series for users. Based on the time series, it measures the behavioral correlation between two users to quantify the strength of users’ social relationship in the real world. Location popularity is used to correct the analysis of the social relationship strength. In the experiment, we apply the method to analyze the data of the campus network system in one Shanghai college, measue the strength of correlation among users, and verify the effectiveness of the method.

A fatigue warning algorithm based on

spatiotemporal feature extraction of facial motion

YU Song,LU Lin-yin

2019, 41(10): 1763-1770. doi:

Abstract ( 218 )

PDF (978KB) ( 315 ) 　　

At present, the fatigue early warning algorithm mostly adopts real-time monitoring and alarming, which has great security risks in the high-speed driving environment. In view of the temporal correlation of human fatigue state, this paper proposes an early warning algorithm based on spatiotemporal feature extraction of facial motion. Firstly, a convolutional neural network with spatial transformation structure is constructed to identify the face region and detect and mark the facial feature points. Secondly, a spatiotemporal feature extraction network is established, and the real-time acquired facial image feature sequence is used to predict and output the future image sequence. Finally, in the outputted image sequence, the comprehensive states of eyes and mouth are used to determine whether a fatigue warning is issued or not.Experimental results show that, under the condition that the image is acquired at 15 frames per second and the 30 frames in the future 2 seconds are predicted, the proposed algorithm can achieve the accuracy of more than 90% when issuing a fatigue warning 26 frames (about 1.5 seconds) in advance, and the accuracy of 97% when issuing a fatigue warning 15 frames (1 second) in advance. Under the average speed of 100 km/h in China's expressways, it is equivalent to an early warning of 40 meters in advance, which can further reduce the occurrence of traffic accidents.

A smoke detection method based

on fusing multiple network models

WANG Yang1，CHENG Jiang-hua1，LIU Tong1，ZHOU Yue-yong1，XIONG Yan-ye2

2019, 41(10): 1771-1776. doi:

Abstract ( 212 )

PDF (824KB) ( 306 ) 　　

In order to reduce the false alarm phenomenon of smoke detection caused by cloud and fog, a smoke detection method based on fusing multiple network models is proposed. On the basis of using VGG16 network to extract the detailed features of smoke, it is fused with the ResNet50 network feature extraction layer to extract more subtle features. The skip connection mechanism is used to transfer the image information to the deeper layer of the neural network, in order to avoid the loss of important features of smoke image and solve the under-fitting problem caused by the gradient disappearance.The training process adopts the feature transfer learning method based on isomorphic space to solve the small sample training problem, retrain in the new target detection field, better integrate the network model, rebuild the output detection structure of the whole connection layer, and adopt the random inactivation method to improve the generalization ability of the model.Experimental results show that, compared with the current popular deep convolutional network, this method has lower false alarm rate and higher accuracy and recall rate.

A weighted graph aggregation algorithm based

on structural similarity and attribute similarity

BING Rui1，MA Hui-fang1,2,3，LIU Yu-hang1，YU Li1

2019, 41(10): 1777-1784. doi:

Abstract ( 191 )

PDF (890KB) ( 320 ) 　　

Graph aggregation is a technology for representing a large scale graph with a concise graph that can preserve the structural and attribute information of the original large graph. Existing algorithms consider either the attribute information of nodes or the weight information of edges, and the difference between the original graph and the aggregated graph can thus be huge. So we propose a graph aggregation method considering both the attribute information of nodes and the weight information of edges, which enables the aggregated graph not only to preserve the similarity of node attributes but also edge weight information. Firstly, we define the closed neighborhood structural similarity, and use a structure pruning strategy to calculate the structural similarity between nodes. Secondly, minimum hash (Minhash) technique is employed to calculate the attribute similarity between nodes, and the proportions of structure similarity and attribute similarity are adjusted, based on which the weighted graph is aggregated. Experiments prove the feasibility and effectiveness of our method.

Compressed image fusion based on image

difference and weighted kernel norm minimization

SU Jin-feng,ZHANG Gui-cang,WANG Kai

2019, 41(10): 1785-1794. doi:

Abstract ( 209 )

PDF (1004KB) ( 343 ) 　　

Existing image fusion algorithms have some problems caused by non-linear operations, such as noise interference and spatial complexity, which make fused images easy to cause distortion and information loss. Compressed sensing image fusion algorithms proposed by some scholars can effectively improve this problem. However, most of them neglect the low rank of image matrix, thus often reducing the quality of fusion. Thus, combining the compressed sensing fusion technology with the low rank matrix approximation method, we propose an image fusion method based on information theory image difference and adaptive weighted kernel norm minimization. The method consists of three stages. Firstly, the two source images are sparsed by wavelet sparse basis, and the measurement output matrix is obtained by compressing the samplings with structural random matrix. Then, the measurement output matrix is divided into blocks, and the fused measurement output matrix blocks are obtained by using the image difference fusion algorithm. Finally, the block weights obtained by adaptive weighted kernel norm minimization method are used to reconstruct the fused image by the orthogonal matching pursuit method. Experimental results verify the validity and universality and show that our method is superior to other fusion algorithms in many evaluation indexes.

A target tracking algorithm based on joint

optimization of improved STC and SURF features

HUANG Yun-ming1，ZHANG Jing1,2，YU Xiao-hui1，TAO Tao3，GONG Li-bo4

2019, 41(10): 1795-1802. doi:

Abstract ( 159 )

PDF (1253KB) ( 290 ) 　　

Aiming at the problem that the target window cannot adapt to target scale change in the traditional spatio-temporal context tracking (STC) algorithm, which leads to inaccurate targeting, we propose a target tracking algorithm based on joint optimization of improved STC and SURF features (STC-SURF). Firstly, the feature points of two adjacent frames are extracted and matched by the speeded up robust feature (SURF) algorithm, and the random sample consensus (RANSAC) matching algorithm is used to eliminate the mismatch and increase the matching precision. Furthermore, the target window is adjusted according to the change of the matching feature points in the two frames of the image, and then outputted. Finally, the update method of the model of the STC algorithm is optimized to increase the accuracy of the tracking result. Experimental results show that the STC-SURF algorithm can adapt to the target scale change, and the target tracking success rate is better than the target-learning detection (TLD) algorithm and the traditional STC algorithm.

An image matching method based on

optimal threshold prediction under hybrid features

YAN Chun-man,HAO You-fei,ZHANG Di,CHEN Jia-hui

2019, 41(10): 1803-1808. doi:

Abstract ( 165 )

PDF (784KB) ( 295 ) 　　

Aiming at the problem of low image matching rate under single feature condition, and the uneven extraction of feature points of the scale-invariant feature transform (SIFT) algorithm due to fixed contrast threshold, we propose a novel image matching method based on adaptive threshold prediction under hybrid features. Firstly, the algorithm uses the SIFT to extract image feature points. Then, we employ the texture parameter second moment method to adaptively calculate the optimal threshold, and the descriptive texture feature vector to constrain the SIFT matching process. Experimental results demonstrate that the proposed method can adaptively select the contrast threshold according to the gray level distribution of the image, enhance image detail information and stabilize the number of extracted feature points. The texture vectors constrain the matching process to avoid the mismatch of similar regions. The method is robust to illumination and blurred images.

A stereo matching algorithm based on

improved absolute difference cost and dynamic window

CHAI Yu,CAO Xiao-jing，LIU Jie

2019, 41(10): 1809-1815. doi:

Abstract ( 153 )

PDF (1059KB) ( 321 ) 　　

Aiming at the problems that the traditional sum of absolute difference (SAD) local stereo matching is easy to cause amplitude distortion and the selection of matching window size is difficult, we propose an improved SAD local stereo matching algorithm. Firstly, based on the traditional SAD algorithm, we use the magnitude relationship of the Euler distance between pixels to replace the pixel difference as the similarity measure function, which makes good use of the continuity constraint between the gray values of adjacent pixels. Under the extreme constraint condition, the dynamic matching window of the guiding filter is established to maintain the edge characteristics well. Finally, the left and right consistency detection strategy is used to detect abnormal matching points, and then the noise is further smoothed to obtain the final disparity map. Experimental results show that the proposed algorithm is efficient and has high matching precision. It has better robustness to illumination distortion conditions and deep discontinuous regions with more edge information.

A multi-feature multi-kernel ELM classification

method for high resolution remote sensing images

#br#

CHU Heng1,2,3,4,CAI Heng1,2,3,SHAN De-ming1,2,3

2019, 41(10): 1816-1822. doi:

Abstract ( 206 )

PDF (900KB) ( 246 ) 　　

Given the complex and variable distribution of high-resolution remote sensing images and the fast classification performance of the extreme learning machine (ELM), we propose a multi-feature multi-kernel high-resolution remote sensing image classification method based on ELM. Firstly, the original image is roughly divided into several feature regions by the multi-scale segmentation algorithm. Then the object information of typical earth features is obtained by merging the coarse segmentation images according to the region merging criterion, and the spectral features and spatial features of the segmentation objects are extracted. A multi-kernel ELM via weighted combination of kernel functions is used to classify images, and the final classification results are obtained. Experimental results show that the proposed method not only reduces the requirements for the target training samples, but also improves the accuracy, timeliness and integrity of the classification.

An improved K-medoids algorithm #br# based on density weight Canopy

CHEN Sheng-fa，JIA Rui-yu

2019, 41(10): 1823-1828. doi:

Abstract ( 214 )

PDF (566KB) ( 262 ) 　　

In order to improve the accuracy and stability of the K-medoids algorithm and solve the problem that the number of clusters of K-medoids algorithm needs to be manually given and is sensitive to the initial cluster center point, we propose an improved K-medoids algorithm based on density weight Canopy. Firstly, we calculate the density value of each sample point in the data set, select the sample point with maximum density value as the first cluster center and remove the density cluster from the data set. Secondly, we select other cluster centers by calculating the weight of the remaining sample points. Finally, the density weight Canopy is used as the preprocessing procedure of the K-medoids and its result is used as the cluster number and initial clustering center of the K-medoids algorithm. The new algorithm is tested on some well-known data sets from UCI real dataset and some artificial simulated data sets. Simulation results show that the new algorithm has higher clustering accuracy and better clustering stability.

A graph model based on global domain

and short-term memory factor

SHAO Yu-han，LI Pei-pei，HU Xue-gang

2019, 41(10): 1829-1836. doi:

Abstract ( 114 )

PDF (767KB) ( 263 ) 　　

Word sense disambiguation (WSD) is a challenging problem in natural language processing. As an excellent semi-supervised disambiguation algorithm in WSD, the genetic max-minant system word sense disambiguation (GMMSWSD) can perform full-text WSD quickly. The algorithm uses a graph based on local context to represent semantic relationships for word sense disambiguation. However, in the process of disambiguation, global semantic information is lost and inconsistent disambiguation results occur, which leads to lower accuracy of the algorithm. We therefore propose an improved graph model based on global domain and short-term memory factor to solve the abovementioned problems. The new graph model introduces global domain information to enhance the processing ability of global semantic information. At the same time, according to the principle of short-term memory, we introduce the short-term memory factor into the model, which can enhance the linear relationship between semantics and avoid the influence of inconsistent disambiguation results on word sense disambiguation. Experimental results show that compared with the classical word sense disambiguation algorithm, the proposal's precision of word sense disambiguation is improved.

An edge-based 2-channel convolutional

neural network and its visualization

LI Yu-chong，YAN Zhao-fan，YAN Guo-ping

2019, 41(10): 1837-1845. doi:

Abstract ( 167 )

PDF (869KB) ( 350 ) 　　

In order to improve the recognition accuracy of small-scale complex images, an edge channel is added into LeNet-5 convolutional neural network to process the edge information. By combining the different features generated by two channels to construct a classifier, a 2-channel convolutional neural network is proposed to identify small-scale complex data sets. Classification results on ten types of product data show that the accuracy of the 2-channel convolutional neural network is much higher than that of the traditional network. Finally, the neural network visualization algorithm is adopted to visualize and analyze the 2-channel convolutional neural network.

An improved collaborative filtering

recommendation algorithm based on expert trust

LIU Guo-li,BAI Xiao-xia,LIAN Meng-jie,ZHANG Bin

2019, 41(10): 1846-1853. doi:

Abstract ( 172 )

PDF (645KB) ( 293 ) 　　

Aiming at the problems of cold start, sparse data, low scalability and low recommendation accuracy caused by insufficient consideration of the correlation between different community clusters, we propose a recommendation algorithm based on the trust of experts in the same community cluster and the trust of experts in different community clusters. In improving the similarity calculation, the improved algorithm not only combines Jaccard correlation coefficient, average score factor of users and Pearson correlation coefficient of weighted processing, but also combines the popularity used to punish the proportion of hot items. When improving the score prediction, the improved algorithm introduces the trust of experts in the same community cluster in the traditional clustering recommendation algorithm, and also introduces the trust of experts in different community clusters. Experiments on the MovieLens dataset show that the improved algorithm not only alleviates the problems of cold start and data sparseness, but also significantly improves recommendation accuracy.

A new knn multi-label classification algorithm based

on local positive and negative labeling correlation

JIANG Yun，XIAO Xiao，HOU Jin Quan，CHEN Li

2019, 41(10): 1854-1860. doi:

Abstract ( 134 )

PDF (781KB) ( 260 ) 　　

In multi-label learning, each sample is represented by a single instance and associates with multiple class labels. Most of existing multi-label learning algorithms explore label correlations globally, by assuming that the positive label correlations are shared by all examples. However, in practical applications, different samples share different label correlations, and there is not only positive correlation among labels, but also mutually exclusive one (i.e., negative correlation). To solve this problem, we propose a KNN multi-label classification algorithm based on local positive and negative label correlation, named PNLC. Firstly, we preprocess the feature vector of multi-label data and construct the most discriminative features for each class. Then, in the training stage, the PNLC algorithm constructs the positive and negative label correlation matrixes by using the truth label of each k-nearest neighbor for all the training samples. Finally, in the test phase, the k-nearest neighbors and corresponding positive and negative pairwise label correlations for each test example are identified to calculate the maximum posterior probability so as to make prediction. Experimental results show that the PNLC algorithm is obviously superior to other well-established multi-label classification algorithms on the yeast and image datasets.

Object root types design of

domain knowledge graph ontology

WANG Ya-qiang1,2,ZANG Gen-lin1,2,WU Qing-rong1,2,ZHAN Chun-li1,2,XIE Xin-yang1,2

2019, 41(10): 1861-1867. doi:

Abstract ( 154 )

PDF (749KB) ( 302 ) 　　

The classification of the object root types is the basic work of constructing the domain knowledge graph, and the popular public knowledge repository is not classified according to the characteristics of the domain data. For instance, the only root type of SUMO is entity, which brings defects to the domain knowledge expression. Knowledge such as texts, videos and pictures, and their relationships cannot be fully expressed. The object root types of domain knowledge graph ontology should include not only the entity type but also the event type, the text type and the multimedia type. Based on the four root types, types are extended to express the domain knowledge. In this way, various typical domain scenarios can be well described. The proposal has a good knowledge system and clear classification ideas in the development of the actual knowledge graph tools.

Current Issue

Author center

Review center

Online journal