Computer Engineering & Science

A review on the L0 instruction cache

ZHANG Kun,HAO Ziyu,ZHENG Fang,XIE Xianghui

2017, 39(03): 405-412. doi:

Abstract ( 214 )

PDF (540KB) ( 619 ) 　　

Review attachment

Energyefficiency becomes one of the key constraints in the current design of processors. Since the instruction unit accounts for considerable chip area and power consumption, we propose an L0 instruction cache (L0 IC) to alleviate the power cost of the instruction units. The L0 IC has small size so the access power is relatively small. Meanwhile the L0 IC is tightly coupled with the pipeline in order to clockgate part of the pipeline logic when instruction fetches hit in the L0 IC. The recent studies on the L0 IC are reviewed. The development and application of each L0 IC design is presented. Meanwhile, future work on the L0 IC design is discussed.

A loadadaptive feedback scheduling

strategy for heterogeneous Hadoop cluster

PAN Jiayi1,2,3,WANG Fang1,2,3,YANG Jingyi1,2,3,TAN Zhipeng1,2,3

2017, 39(03): 413-423. doi:

Abstract ( 191 )

PDF (937KB) ( 511 ) 　　

Review attachment

With the development and practice of big data technology, Hadoop YARN (Yet Anouther Resource Negotiator) scheduler is no longer an effective solution in heterogeneous cluster environment. On the one hand, YARN cannot dynamically allocate the resources of nodes, which leads to a waste of better nodes’ resources and poor overall system performance. On the other hand, YARN’s existing static resource allocation policy ignores the difference of the different stages, which causes a large number of resource fragments. Aiming at the above problems, we put forward a loadadaptive feedback scheduling strategy. The system monitors the performance of all nodes and jobs, evaluates the computing power of each node with the realtime monitoring data. Then the scheduler starts the dynamic resource scheduling strategy based on the similarity assessment together with the monitoring information of nodes and jobs’ performance. The optimized system can distinguish the heterogeneity of different nodes, allocate resources for tasks’ realtime needs dynamically, refine YARN’s scheduling semantics and be used as a secondary resource scheduling strategy of the upper scheduler. We implement and test the strategy on Hadoop 2.0, and the experimental results show that this scheduling strategy can significantly improve the utilization rate of resources, improve the cluster’s concurrency by 2 to 3 times, and enhance the performance by nearly 10%.

Design and implementation of a parallel peer pressure

clustering algorithm based on CombBLAS

ZOU Pei-gang1,2，CHEN Jun1

2017, 39(03): 424-429. doi:

Abstract ( 234 )

PDF (486KB) ( 445 ) 　　

Review attachment

Graph clustering is a problem of determining natural groups with high connectivity in a graph. This can be useful in fields such as machine learning, data mining, pattern recognition, image analysis and bioinformatics. To meet the graph-theoretic analysis demands of emerging“big data” applications, it is essential to speed up the underlying graph problems of current parallel systems. However, it is difficult to parallelize large-scale graph computation and achieve good performance using traditional approaches due to their irregular graph structure and low operation intensity. We implement a scalable distributed-memory algorithm for peer pressure graph clustering using the sparse matrix infrastructure in Combinatorial BLAS. We first convert the peer pressure graph clustering algorithm to sparse matrix computation, which allows irregular data structures and access patterns in parallel applications to be represented and can efficiently address the graph parallel challenge. Finally, the proposed algorithm is parallelized based on the MPI programming model. Experiments show that when the scale of the graph represented by a sparse matrix is up to 4.3 billion, the parallel peer pressure clustering algorithm based on linear algebraic has high performance and is well scalable on the Dawning Supercomputer, and the speedup can be up to 40.1x when the number of core scales to 64.

CAF-WAS based pre-bonding TSV testing

BIAN Jing-chang, LIANG Hua-guo, NIE Mu, NI Tian-ming, XU Xiu-min, HUANG Zheng-feng

2017, 39(03): 430-435. doi:

Abstract ( 197 )

PDF (1278KB) ( 486 ) 　　

Though silicon via (TSV) open/leakage defect reduces the reliability and yield of three-dimensional integrated circuits, so pre-bonding TSV tests are especially important. The existing charge and float, wait and sample (CAF-WAS) test method for leakage defect test is superior to other methods such as ring oscillator, etc., however, it cannot test open defect. We propose a pseudo leak path thought to solve this problem. In addition, we redesign a waiting time generation circuit of to reduce the test time overhead. HSPICE simulation results show that the proposed method can accurately forecast the open defect and the scope of the leakage defect with only 25% test time of the existing method ［14］.

Parallel iterative compilation

based on multi-core architecture

TAN Yan-dan,YI Hui-zhan,ZHANG Peng

2017, 39(03): 436-442. doi:

Abstract ( 186 )

PDF (962KB) ( 390 ) 　　

Review attachment

Iterative compilation is a compiler optimization technique and it has been proved that it can significantly improve program performance. However, iterative compilation needs to compile and run a program multiple times, so the process is time-consuming. In order to make full use of the existing multi-core computing resources and speed up the iterative process, we parallelize the evaluation phase of iterative compilation based on a new iterative framework named “OpenTuner”. Meanwhile, we explore the impact of the parallelization of evaluation phase in iterative compilation on the optimization results, and analyze the experimental results.

An energy-efficient instruction cache:

A symmetric instruction cache

LIU Xiao,GAO Hong-guang,CHEN Fang-yuan,DING Ya-jun

2017, 39(03): 443-450. doi:

Abstract ( 148 )

PDF (1403KB) ( 416 ) 　　

Review attachment

Tag access and comparison consumes a significant portion of instruction cache’s energy. We propose a power-efficient instruction cache based on inference: asymmetric instruction cache. Sequential instructions and non-sequential instructions are treated differently and asymmetric ID invalid bits are applied according to lower ration feature of non-sequential instructions. Hit inference technique is used to reduce the number of tag comparison in cache. And adaptive-length instruction fetch techniques are designed to reduce the miss rate of sequential instruction blocks. Compared with tagless hit instruction cache (TH IC), both energy efficiency and performance of the proposed method are improved. Experimental results show that the overall energy consumption is reduced by 40%～60% over L1 Cache for most of the selected SPEC2006 benchmarks. Energy-delay2 product (ED2P) of instruction fetch is 50% lower than that of L1 cache and 17% lower than that of TH IC.

QTRSM on ARMv8 64-bit multi-core processor

DU Qi,JIANG Hao,LI Kuan,PENG Lin,YANG Can-qun

2017, 39(03): 451-457. doi:

Abstract ( 206 )

PDF (553KB) ( 357 ) 　　

Review attachment

We implement a quad-precision triangular matrix solution with multiple right-hand sides (QTRSM) based on OpenBLAS on the ARMv8 64-bit multi-core processor. We also propose two methods to implement QTRSM. One is based on GCC complier which accepts the long double data type as quad-precision floating-point numbers. The other uses the double-double data type and its corresponding quad-precision addition, subtraction, multiplication and division algorithms to implement QTRSM. We compare the two methods under different matrix sizes. Experimental results show that the two methods have the same accuracy. However, on average the method using double-double format runs 1.6 times faster than the one using long double format. As the number of threads increases, the speedup of the two QTRSM implementation methods are both close to 2.0, which has good scalability.

An optimization scheme for TSV-

based 3D-SIC test scheduling

NIE Mu1,LIANG Hua-guo1,2,BIAN Jing-chang2,NI Tian-ming2,XU Xiu-min2,HUANG Zheng-feng2

2017, 39(03): 458-463. doi:

Abstract ( 123 )

PDF (588KB) ( 316 ) 　　

Review attachment

3D-SIC, which improves the system integration and overall performance validly by using the TSV technology, makes vertical interconnect circuits come true. Since the number limitation of TSVs and pins for testing and test power consumption all have effects on 3D-SIC test time, we propose a testing scheme based on the concept of container-packing problems, which can optimize the test scheduling for the “single tower” structure which only has one chip in each layer and the “multiple towers” structure which has more chips in each layer. This optimization scheme can not only control the numbers of test pins and TSVs for testing and the power consumption but also reduce test time effectively. Experimental results show that the proposed scheme has obvious optimization results: the “single tower” structure can reduce up to 45.28% of the testing time while the “multiple towers” structure can reduce up to 27.78%.

A model reduction method based

on restarted Lanczos process

YANG Ping1，XU Kang-li1，JIANG Yao-lin1,2

2017, 39(03): 464-469. doi:

Abstract ( 213 )

PDF (541KB) ( 365 ) 　　

Review attachment

We propose a model reduction method based on the restarted Lanczos process for some large scale linear time-invariant systems. Firstly, we utilize the restarted Lanczos process to obtain the approximate matrices of both the reachability Gramian matrices and the observability Gramian matrices of the original system. Then, according to the Lyapunov equations satisfied by the reachability Gramian matrices and the observability Gramian matrices respectively, we can construct the projected Sylvester equations, solve them and carry out the biorthogonal process on the solution to get the required transformation matrices, thus obtaining the order-reduced system of the original system. Using the proposed algorithm to reduce the order of the large scale linear time-invariant system, we can obtain a stable reduced order system in higher precision. Numerical examples verify the effectiveness of the proposed algorithm.

An optimized path selecting strategy in satellite

network based on satellite link information

ZHU Zhen-kai1,2，HUANG Chuan-he1,2

2017, 39(03): 470-476. doi:

Abstract ( 260 )

PDF (1034KB) ( 438 ) 　　

Review attachment

Two-layered LEO/MEO satellite networks outperform single-layered satellite networks for their better communication service, however, frequent switching of snapshots leads to large computation cost, poor handling capacity of link congestion and node failure. According to the characteristics of LEO/MEO satellite networks, we propose a routing protocol, which can calculate the actual communication overhead between adjacent satellites, estimate the communication overhead between an optional satellite and the destination satellite, and select an optimal path to ensure certain link utilization and low delay. When the link switch or node failure happens, we narrow the searching area. It therefore becomes unnecessary to recalculate the path from the source node to the destination node, and we can just update part of the failed nodes. Simulation results show that the proposal has good performance in reducing establishing paths and reducing link congestion.

An ABP algorithm for user influence

analysis in social networks

ZHANG Xiao-shuang1，XIA Qun-feng2，LIU Yuan1，XU Yan-fei1

2017, 39(03): 477-484. doi:

Abstract ( 191 )

PDF (759KB) ( 401 ) 　　

Review attachment

As a means of communication, social networks have taken root in people’s hearts. The user data of social networks has a lot of value in this big data era. With the opening of Twitter Application Programming Interface (API), Twitter, as a social networking site, has become a popular research object, especially the user influence. The PageRank algorithm has long been in use to calculate users’ influence, however, it is too dependent on the following relationship between users, so the ranking of users does not have strong timeliness. We introduce user activity to improve the PageRank algorithm, which has a certain degree of timeliness, but not convincing and accurate. We propose a new algorithm called PageRank activity based (ABP) algorithm according to the time distribution of user activity, and corresponding ageing weight factors are applied to the active degree of different periods of time. Finally we taking Twitter as the research object and combining with the social relationship graph, we prove that the ABP algorithm is more efficient and persuasive through an example analysis, and it can be more accurate in improving the ranking of active users and reducing the ranking of inactive users.

A safety data dissemination algorithm

based on data priority and vehicle density

WANG De-wei1,2,HUANG Chuan-he1,2

2017, 39(03): 485-491. doi:

Abstract ( 134 )

PDF (555KB) ( 324 ) 　　

Review attachment

In order to ensure the precision of safety data in VANET, the safety-related data often requires a higher priority and preferred dissemination, which the existing data dissemination algorithms of VANET cannot satisfy. In order to meet the demand of the differentiated transmission priorities of safety data in VANET, we propose a model of data priority and an algorithm of safety data dissemination based on data priority and vehicle density, called priority density transmit (PDT). Firstly, in order to differentiate the data priority of different packets, we propose a model of data priority, which assigns different priorities based on data state. Secondly, based on the difference of data priority and vehicle density, we propose an algorithm of safety data dissemination—PDT, which dynamically assigns the counter threshold and collision window based on data priority and vehicle density. Dynamically assigning the collision window can ensure the transmission order of data, and dynamically assigning the counter threshold cannot only ensure reliable transmission of data but also reduce the broadcast storm. We evaluate the performance of the PDT by comparing it with the counter-based algorithm and slotted-P algorithm. Simulation results show that the PDT can significantly improve vehicle throughput and packet delay, and reduce the broadcast storm.

A robust digital watermarking embedding strategy with

blind detection based on Tetrolet transform and SVD

BAO Lin,ZHANG Zhen-kai,LI Yuan-jiang,GONG Miao

2017, 39(03): 492-499. doi:

Abstract ( 146 )

PDF (864KB) ( 372 ) 　　

Review attachment

We present a robust digital watermarking embedding strategy with blind detection based on Tetrolet transform and singular value decomposition (SVD). Firstly, the binary watermarking image is pretreated by two-factor chaotic encrypting processing. The original host image is transformed by Tetrolet transform, and then, the low-frequency coefficients in two-level are decomposed by the means of the blocked SVD. Lastly, the encrypted watermark information is embedded into those largest coefficients. The proposed strategy can extract the watermarking image without reference to the original host image or the original watermarking image, which achieves blind detection and improves the practicability of the digital watermarking system. Simulation results show that the proposed watermarking algorithm is secure, invisible and robust to common image processing and geometric attacks.

A scale-free network evolving model

with core-periphery structure

ZHANG Qi-lin1,DONG Zheng-cheng2,ZHAO Yong-biao1

2017, 39(03): 500-504. doi:

Abstract ( 225 )

PDF (523KB) ( 493 ) 　　

Review attachment

The core-periphery structure is an important and common structure in complex networks, however, there is little relevant research on it. In order to study and find some properties of the coreperiphery structure in complex networks, we analyze the structure of the random block model. On this basis, we propose a scale-free network evolving model with core-periphery structure. Theoretical and numerical analysis verify that the proposed model has sound scale-free characteristics and a core-periphery structure and its compactness is adjustable. This model can provide basis for further study on the characteristics of the core-periphery structure in complex networks.

A WSN security situational awareness

model based on set pair analysis

ZHANG Shao-shuai,YUAN Jin-sheng

2017, 39(03): 505-511. doi:

Abstract ( 159 )

PDF (635KB) ( 362 ) 　　

Review attachment

To improve the accuracy and sensitivity of the security evaluation of wireless sensor networks (WSNs), we introduce the concept of security situational awareness. We adopt the theory of set pair analysis to evaluate the security situation of WSNs, and the value of security situation is then applied to assess the threats this network has encountered. In the experiment, we use the KDD Cup 1999 data set to simulate wireless network running situation, and analyze the changes in the value of security situation in 11 network attacks which are simulated by changing the quantity of attacked nodes in the network under different intensities. Experimental results prove that the proposed model can improve the accuracy of network security assessment, and divide the network security situation into three levels. It is more sensitive than the security entropy based models in detecting medium and low intensity attacks.

Blind stereo image quality assessment based on

disparity map and complex contourlet transform

WANG Gang，LI Chao-feng

2017, 39(03): 511-518. doi:

Abstract ( 147 )

PDF (1038KB) ( 352 ) 　　

Review attachment

Given that the existing no-reference quality assessment methods for 2D images cannot be well used in stereoscopic images, we propose a no-reference (NR) image quality assessment method based on the disparity map and complex contourlet transform so as to effectively measure the quality of different types of distorted images. Firstly, we extract the disparity map that can reflect the 3D information, decompose the image by the complex contourlet transform and extract the following features: the energy of the sub-band coefficients within scales and the energy differences between scales. Then those features are fed to the support vector regression (SVR) model which can predict the image quality. Experimental results show that the proposed method outperforms current 3D IQA methods.

An algorithm for block two dimensional locality

preserving projection based on L1 norm

DING Ming，JIA Wei-min

2017, 39(03): 519-523. doi:

Abstract ( 157 )

PDF (464KB) ( 343 ) 　　

Review attachment

In order to solve the problem that high-dimensional input data may have singular value, as well as to improve the operation efficiency and robustness of the algorithm, we propose a new algorithm named block two dimensional locality preserving projections based on L1-norm (B2DLPP-L1). Traditional locality preserving projection (LPP) uses the principal component analysis (PCA) to project input data to PCA subspace to avoid singular value problem, however, input data can lose some effective information in this way. The B2DLPP-L1 algorithm chooses two dimensional data as input data, and it divides original input images into modular images and use the images which are divided into two types as the new input data afterwards. Then we apply the proposed algorithm to the sub-images to reduce the dimensionality. In theory, the B2DLPP-L1 algorithm can better reduce dimensionality, preserve effective information of input data, reduce computation, improve operation efficiency of the algorithm, and overcome the problem of low classification accuracy and improve algorithm robustness. Experimental results on face databases reveal that the B2DLPP-L1 algorithm utilizes less time to accomplish the nearest-neighbor classification and obtain more accurate classification rate.

Object tracking algorithms based on

appearance models:A survey

LI Na1,2,3，ZHAO Xiang-mo1，ZHAO Feng2,3，LIU Wei-hua2,3，WANG qian2,3

2017, 39(03): 524-533. doi:

Abstract ( 196 )

PDF (1120KB) ( 547 ) 　　

Review attachment

Visual object tracking is a hot topic in areas of pattern recognition, computer vision and machine learning, which has a wide range of applications in video surveillance, video compressing coding, video retrieval, intelligent transport system and so on. In order to make a better understanding of object tracking algorithms based on appearance models for domestic and international peers, we comprehensively review the latest research progress in this field. We firstly introduce the mechanism of tracking algorithms.Then we summarize current researches on visual object tracking based on appearance models, including generative and discriminative methods. Thirdly, the state-of-the-art tracking algorithms are compared on the public datasets. We finally discuss future research trend of object tracking.

Eyes localization based on integral

projection and differential projection

HOU Xiang-dan,ZHAO Dan,LIU Hong-pu,GU Jun-hua

2017, 39(03): 534-539. doi:

Abstract ( 191 )

PDF (547KB) ( 556 ) 　　

Review attachment

Face recognition is an important part of biometric identification. The eyes are one of the most prominent features of the face, and their localization is a key step of face recognition. Integral projection is usually used for eyes localization, but the horizontal position of eyebrows is always mistaken for eyes’when we use this method directly. This can lead to accuracy reduction. So after roughly marking out the area of face, we add the differential projection, namely, the gray value of the region that changes frequently in the horizontal direction, to the integral projection. The accurate position of eyes is defined by combining the integral projection and the differential projection. Experiments on the ORL face database show that the proposed method can get 90.5% accurate positioning performance, thus locating eyes more exactly.

A master-slave object surveillance system based

on fisheye camera and PTZ camera

#br#

WU Jian-hui1,2,SHANG Cheng1,ZHANG Guo-yun1,2,LI Jiao-jie1

2017, 39(03): 540-546. doi:

Abstract ( 204 )

PDF (954KB) ( 420 ) 　　

Review attachment

We propose a master-slave object surveillance system based on fisheye camera and PTZ camera which has hemisphere space view field. The fisheye camera that has a super big field of view about 180 degree works as the master camera, and the PTZ camera as the slave camera which can shot high resolution images of the objects which controlled by PTZ parameters. Firstly, we use the moving blobs model to detect the moving object in the fisheye image , and calculate the object parameters of azimuth angle P′,elevation angle T′and distance Z′ in the fisheye image space. Then the P′T′Z′ parameters are mapped from the fisheye image space to the PTZ image space and output to control the PTZ camera. Calculation of P and T parameters can be achieved with the help of the distortion coefficient of fisheye lens, and Z parameter can be calculated according to the relative size of the object in the fisheye image and PTZ image. Test results of PTZ parameters show that the relative errors meet the requirement of this system. In real outdoor experiments, the PTZ camera can point to the moving object and shot the image stably wherever the object locates at the fisheye image, and the object has an appropriate size and a high resolution.

Current Issue

Author center

Review center

Online journal