Computer Engineering & Science

Architecture of a polymorphous parallel computer

LI Tao1,YANG Ting1,YI Xueyuan1,PU Lin1,QIAN Bowen1,HUANG Guangxin2

2014, 36(2): 191-200. doi:

Abstract ( 450 )

PDF (1559KB) ( 698 ) 　　

A novel and efficient polymorphous array architecture, the Firefly2, is proposed. Its Processing Element (PE) can run in both SIMD and MIMD modes. The PE supports asynchronous interthread communication and efficient parallel execution of distributed instructions. A PE contains a multithread manager to realize onestep context switching and a router for fast data communication. This architecture is highly efficient in realizing parallel computation at thread level, data level, and instruction level. In particular, the performance of this architecture is comparable with ASIC when used for stream processing. This architecture is capable of implementing highperformance, classical static and dynamic dataflow computation. The architecture is designed for computer graphics, image processing and digital signal processing applications.

Design and implementation of high radix Montgomery modular multiplication array structures

WU Guiming，XIE Xianghui，WU Dong，ZHENG Fang，YAN Xinkai

2014, 36(2): 201-205. doi:

Abstract ( 487 )

PDF (570KB) ( 498 ) 　　

Two linear arrays for high radix Montgomery modular multiplication are proposed. They use two different parallelization methods, both of which can exploit pipelined parallelism through task assignment and task scheduling along different loop dimensions. The two linear arrays for 256bit modular multiplication using the radix of 216, are implemented on Xilinx XC5VLX330 FPGA. The experimental results show that both linear arrays have the latencies of 84 cycles, and the throughput of 1/17 and 1/21, respectively. Compared with the related work, our designs have higher throughput. Moreover, the balance between performance and hardware overhead can be achieved.

Vectorizable implementation of adaptive deblocking filter on YHFT-Matrix DSP

LI Yong,CHEN Shuming,CHEN Shenggang

2014, 36(2): 206-210. doi:

Abstract ( 344 )

PDF (993KB) ( 501 ) 　　

A parallel vector implementation method for the deblocking filter in H.264 coding algorithm is mapped on the YHFTMatrix DSP. The deblocking filter is analyzed theoretically. The vector data access unit, the vector processing unit, the data shuffle unit and the flexible matrix are fully used to efficiently develop the parallelism of the deblocking filter algorithm. In experiment, the deblocking filter algorithm is mapped on both YHFTMatrix and TI TMS320C6415. And the result shows that YHFTMatrix outperforms TI TMS320C6415.

A novel embedded memory system for stride accesses

Lv Hui1，DING Yajun2，ZHENG Fang1，WU Dong1，XIE Xianghui1

2014, 36(2): 211-215. doi:

Abstract ( 411 )

PDF (600KB) ( 463 ) 　　

A novel multiprime embedded memory system is proposed, based on the theory of prime memory system and memory access scheduling. The system can significantly improve the performance of stride memory access at low hardware cost. Theoretical analysis and the experimental results prove the correctness and effectiveness of the system.

Design and implementation of a NIC based RDMA reliable communication protocol

XIA Jun，PANG Zhengbin，LIU Lu，ZHANG Jun，CHANG Junsheng

2014, 36(2): 216-221. doi:

Abstract ( 492 )

PDF (800KB) ( 624 ) 　　

With the continually growing size and complexity of high performance computing systems, reliability has become the crucial factor of affecting the availability of high performance computing systems. System network is the important component of high performance computing systems and its reliability must be considered in high performance computing system design. Aiming at failures possibly occurring in high performance computing system network, the paper proposes a NIC based RDMA reliable communication protocol, gives a general framework of realizing this protocol and discusses some optimized implementation methods based on the framework. The reliable communication protocol and its implementation can tolerate system network failures and can reduce the overhead of realizing reliable communications. The experimental results show that the performance of the RDMA reliable communication is comparable with that of the noconnection RDMA communication.

Revision of deviation between the pictures generated from GDSII data and the photos of die

HU Xing1,KUANG Jishun1,LI Shaoqing2

2014, 36(2): 222-225. doi:

Abstract ( 397 )

PDF (579KB) ( 452 ) 　　

As the steps for the design and manufacture of the die increase, it is more and more possible to have hardware Trojans implanted in the chip. So it is necessary to detect the hardware Trojans for the die, especially with high security requirements. Reversely anatomizing the die and comparing it with the original GDSII file on the consistency is one of the main methods to test whether the die has been implanted hardware Trojans or not. It is an important step to generate the pictures from the GDSII file and make the pictures correspond to the photos which are got by reversely anatomizing the die. Therefore, we propose a twopoint position algorithm for the segmentation of GDSII pictures. Meanwhile, for the position offset of the picture information generated in the process of taking photos of die and splicing, we propose a correction algorithm based on deviation statistics. Engineer practical application proves that the algorithm revises the deviation between the pictures generated from GDSII file and the photos of the die well. And it eliminates the impact from the position offset of the picture information.

A multi-scale management method for visualization of vector data on server cluster

SUN Lu1,CHEN Luo1,LIU Lu1,SU Deguo2

2014, 36(2): 226-232. doi:

Abstract ( 327 )

PDF (1181KB) ( 570 ) 　　

A multi-scale management method for visualization of geographic vector data on server cluster is proposed. Based on the idea of vector data tiling, a global tilepyramid index model is established. Then, the vector dataset is divided into individual vector tiles according to the index structure. When rendering tiles on servers, the tiled dataset acts as the feature data source so as to avoid doing spatial query on the raw dataset on the fly. Experimental results indicate that the proposed method can reduce the time of data preparation and the cost of I/O when rendering a tile image, and consequently promote the performance of geographic vector data visualization.

A class-based data race static detection algorithm for Java multithread programs

SONG Donghai,BEN Kerong,ZHANG Zhixiang

2014, 36(2): 233-237. doi:

Abstract ( 388 )

PDF (495KB) ( 671 ) 　　

The widespread use of multithread concurrent programs induces more detrimental data race problems, race detection is very important for improving software quality. Combining data race static detection with static program slicing, a classbased data race static detection algorithm for Java multithread programs is proposed. The algorithm obtains function callchains by using function calls, analyzes every field of a class, finds out possible data race, reduces the range of program analysis through static program slicing, and removes the impossible data race by considering the necessity of data race. An example demonstrates that the proposed algorithm can guide programmers to fix software data race defects.

Research of multi-homed network service deployment based on strategy DNS and HTTP Proxy

WANG Zirong,HU hao,YIN Shaofeng,WANG Yuke

2014, 36(2): 238-243. doi:

Abstract ( 353 )

PDF (744KB) ( 477 ) 　　

In current multihomed network service deployment methods, the service access route is unreachable due to the changed ISP network addresses, DNS configuration, or other factors. Aiming at this drawback, the paper proposes a strategy of multihomed network service deployment based on strategy DNS and HTTP Proxy. The proposed strategy installs HTTP Proxy server on each ISP output and works with DNS server to perform joint deployment. The strategy is analyzed theoretically and the deployment method is described in detail. Case analysis and effect evaluation demonstrate that the proposed strategy can greatly improve the speed and quality of internet user access of the campus network and enterprise network information resource services, without changing the network topology and increasing no more investment basically. It is concluded that the strategy solve the problem that the route is unreachable in the traditional way.

Research on middleware for wireless sensor network

WANG Lin,JIANG Jie

2014, 36(2): 244-249. doi:

Abstract ( 504 )

PDF (744KB) ( 574 ) 　　

Wireless sensor networks are widely used and attract more attention, but have the problem of lack of a unified open interface due to the increase of network complexity and network applications. In view of the structure and characteristics of wireless sensor networks, the concept of wireless sensor network middleware technologies is introduced. Based on the comprehensive analysis of problems and challenges confronted by the wireless sensor network middleware, several concrete middleware design approaches are presented, and several typical instances of middleware are compared in some important performance parameters such as QoS support, reliability, mobility, etc. Finally, an improved method in the QoS mechanism of middleware is proposed.

A Petri net based Web services composition security dynamic detecting technology

ZHOU Jie,REN Jiangchun,WANG Zhiying,CHENG Yong,MEI Songzhu

2014, 36(2): 250-257. doi:

Abstract ( 356 )

PDF (743KB) ( 486 ) 　　

Web services composition security detection is important to enhance Web services composition security in complex network environment. To solve the existing problem in it, the paper proposes a Web services composition security dynamic detecting framework and analyzes the key technology in detail, such as services composition modeling technology, security model and security detecting algorithm. Finally, an example test is taken to show it can realize the security detection function.

Trust management for Web services based on extended UDDI

MENG Dong,CAO Jienan,ZHU Peidong

2014, 36(2): 258-264. doi:

Abstract ( 273 )

PDF (932KB) ( 399 ) 　　

With the rapid development of internet technology, Web services are becoming more widespread in the recent years, but the traditional Web services discovery mechanism can not satisfy the requirements of clients due to its limitations．Meanwhile, current researches mainly focus on QoSbased Web services model and computation, but neglect services management. To address the problem, a trusted management mechanism is proposed based on natural attenuation and user feedback, which can eliminate the invalid services and provide reference evidence for the clients to select the best service．Through the mechanism, the usefulness and usability of the Web discovery system can be improved.

Performance analysis of the 3G-HF high-rate data link protocol

WU Xiaohe1,RAN Maoru2,XI Yong1,ZHOU Li1

2014, 36(2): 265-269. doi:

Abstract ( 493 )

PDF (834KB) ( 685 ) 　　

The domestic research on the 3rd generation HF communication technology is still at the primary stage and most HF radio system are based on the 2nd generation. Compared with the 2nd generation HF data link protocols, the 3rd generation HF data link protocols present much higher-efficient and superior. In this paper, we studied the performance of the 3G-HF High-Rate Data Link Protocol (HDL) based on the standard which defined in U.S. MIL-STD-188-141B. Considering the different conditions, we discuss the bit error rate, packet error rate, average transmission times and throughput performance by simulations and theoretical analysis.

Comparison of ionospheric delay accuracy between GPS and Beidou

ZHANG Feizhou,YANG Zemin,CHENG Peng,ZHAO Lijun

2014, 36(2): 270-274. doi:

Abstract ( 490 )

PDF (720KB) ( 441 ) 　　

In the compatible process of GPS and Beidou, if it uses two-system twofrequency receiver, the main factor influencing the ionospheric delay accuracy is the ionospheric parameter in each system’s ephemeris. Due to the difference of the two systems’ parameters, when it applies the same ionospheric delay model to acquire the ionospheric TEC, the final results will show somewhat differences. When taking the use of GPS/Beidou multimode receiver, it can obtain the ionospheric parameters in GPS and Beidou’s ephemerides to acquire delay TEC with the ionospheric model. Then the IGS ionospheric TEC in the same place and same time will be seen as the reference so that it can make contrast with GPS and Beidou’s ionospheric delay accuracy. In the experiment, the parameters in the two systems’ ephemerides show the significant differences so that once it use the same Klobuchar model, it will bring different TEC. When the TEC results from GPS and Beidou subtract IGS reference value, the final results show that the accuracy of GPS is higher than the Beidou’s.

A novel domain adaptation approach based on data classification

GU Xin1,2,WANG Shitong1

2014, 36(2): 275-285. doi:

Abstract ( 409 )

PDF (912KB) ( 439 ) 　　

General machine learning assumes that the distribution of training data and test data are same, but the domain adaptation algorithms aims at handling different but similar distributions among training sets, which have a wide range of applications such as transfer learning, data mining, data correction, data projections. Support vector machine (SVM) attempts to find an optimal separating hyperplane for binaryclassification problems in highdimensional space, in order to ensure the minimum classification error rate. CCMEB proposed by I Tsang, as an improvement of the CVM, is particularly suitable for training on large datasets. In this article SVM and CCMEB are combined with probability distribution theory to formulate a novel domain adaptation approach (CCMEBSVMDA). By calculating the center of each dataset, we can correct the dataset or identify the similarity of data between different domains.This fast algorithm has a good adaptability. As a validation we test it on the fields of “UCI data” and “text classification data” and the obtained experimental results indicate the effectiveness of the proposed algorithm.

A optimized realtime operation for explosives storage based on RFID and genetic algorithm

FU Huawei1，HE Xiaomin1，XU Liang1,3，LI Xiuxi2,HUANG Zhiping3

2014, 36(2): 286-291. doi:

Abstract ( 547 )

PDF (831KB) ( 527 ) 　　

Aiming at optimization problems of management of explosive storage, an optimized online operation method of explosives storage based on the Radio Frequency IDentification (RFID) technology and genetic algorithm is proposed, making management of explosives storage more efficient, informatization, secure, intelligent. The information of explosives warehouse are acquired by RFID realtime technology, and an assignment strategy for location of explosive warehouse is proposed. The mathematical model of explosives storage optimization is constructed through analyzing operation characteristics of explosives storage and requirements, and the complicated model is solved by the genetic algorithm. The simulation results show that the proposed method can improve utilization rate of warehouse space, and optimize the walking path in the process of taking and putting the explosives, as well as solve the operating problems under the constraint condition, such as a validity of explosives.

Adaptive license plate location in complex weather

YU Ming，LU Qianqian，LIU Yufei，WANG Fei

2014, 36(2): 292-297. doi:

Abstract ( 350 )

PDF (809KB) ( 493 ) 　　

According to the problem that the traditional license plate algorithms have low location rate when the images are collected in the changing weather or insufficient light, an adaptive license plate location algorithm is proposed. The algorithm determines the different weather and the image contrast by the color characteristics and image definition. The wavelet transform coefficients are used to reduce noise and stretch the contrast, achieving the effect of image enhancement. At last, it combines vertical protection and template matching algorithms to locate the license plate. The tests show that the use of the algorithm can effectively remove the noise, and solve the differentcontrast license plate location in the sun, rain, fog and dusk conditions. The average rate of license plate location is 93.4%.

Murals inpainting based on color clustering image segmentation and the improved FMM algorithm

REN Xiaokang,DENG Linkai

2014, 36(2): 298-302. doi:

Abstract ( 377 )

PDF (1515KB) ( 566 ) 　　

In recent years, due to the various environmental and artificial factors, murals are destroyed to some extent. In order to make people appreciate the original style and features of murals, carry on the research and development of these cultural resources, and restore the original frescoes appearance, the paper proposes a murals fade and scratch repair algorithm. Based on Lαβ color space, for the first time, this algorithm proposes to use the color clustering and masking algorithm to segment and extract the damaged mural regions. Besides, according to the drawback that the transmission direction of the FMM algorithm does not fully cover the to-be-repaired regions, we propose to use some features of gradient histogram to optimize the transmission direction. Experiments prove that the algorithm has good effect on repairing damaged murals.

Fast fully affine invariant image matching based on ORB

HOU Yi,ZHOU Shilin,LEI Lin,ZHAO Jian

2014, 36(2): 303-310. doi:

Abstract ( 425 )

PDF (1377KB) ( 617 ) 　　

Affine-SIFT is fully affine invariant but its computation is time-consuming. Oriented FAST and Rotated BRIEF (ORB) is extremely fast but is not affine invariant. In order to solve the problem that it is difficult to balance the good affine invariance and the real-time performance in image matching, a new fast method (AORB, Affine-ORB) for fully affine invariant image matching based on ORB, which applies the ASIFT method simulated in all views to obtain the full affine invariance of ORB, is proposed. Firstly, it simulates enough image views obtainable by varying the viewpoints of camera. Secondly, all simulated image pairs are matched using fast ORB. Thus, the full affine invariance is obtained. Experimental results show that proposed method is efficient in fully affine invariant image matching and it is 6 times faster than ASIFT.

Color image watermarking scheme based on HVS and relationship in DCT domain

XIONG Xiangguang,WANG Duanli

2014, 36(2): 311-316. doi:

Abstract ( 321 )

PDF (686KB) ( 465 ) 　　

A novel watermarking technique based on human visual system (HVS) and relationship is proposed in the discrete cosine transform (DCT) domain. Firstly, the binary watermarking image is processed by chaotic encryption and Arnold scrambling. Secondly, each component of the original color image is subdivided into blocks of size 8×8 and is performed by DCT transform. Finally, according to the brightness and texture masking features of each sub-block and the watermarking pixel values, the relationship of the selected DCT coefficients is adaptively adjusted to embed the watermarking. Experiments show that the proposed algorithm has good transparency and the ability to resist a variety of attacks and the embedded watermarking in red component is more robust against compression and additive noise et al. than the embedded watermarking in the blue or green component. Compared with similar algorithms, the proposed algorithm has better performance.

Current Issue

Author center

Review center

Online journal