Loading...
  • 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Current Issue

    • Optimized scheduling algorithms for parallel task graphs
      LI Yufeng1,2,MO Zeyao2,XIAO Yonghao1,XIONG Min1
      2019, 41(06): 955-962. doi:
      Abstract ( 252 )   PDF (670KB) ( 263 )      Review attachment
      Many complex application problems in scientific and engineering computing can be solved through scientific workflow technology, and these scientific workflows in supercomputing domain are modeled by parallel task graphs. Effective scheduling of parallel task graphs is significant for efficient execution of applications. We propose a scheduling  model of parallel task graphs under resource constraints. Some optimal scheduling conclusions are given for ForkJoin type parallel task graphs. In addition, we propose a new scheduling algorithm for general parallel task graphs. This algorithm considers the effect of data communication overhead on resource allocation and scheduling performance. We also improve the CPA algorithm under specific conditions. Our algorithms are compared with the commonly used CPR and CPA algorithms. Experimental results show that our new algorithms can achieve good scheduling results. The proposed scheduling algorithm and the optimal conclusions have reference significance for the development of high performance scheduling function for workflow application systems.
       
      A configurable convolutional neural network
      accelerator based on tiling dataflow
       
      LI Yihuang,MA Sheng,GUO Yang,CHEN Guilin,XU Rui
      2019, 41(06): 963-972. doi:
      Abstract ( 225 )   PDF (1507KB) ( 201 )      Review attachment
      Convolutional neural networks (CNNs) have been recognized as the best algorithm for deep learning, and they are widely used in image recognition, automatic translation and advertising recommendations. Due to the increasing size of the neural network, the number of the neurons and synapses of the network is also enlarged. Therefore, using specific acceleration hardware to mine the parallelism of CNNs becomes a popular choice. For hardware design, the classic tiling dataflow has achieved high performance. However, the utilization of processing elements of the tiling structure is very low. As deep learning applications demand higher hardware performance, accelerators require higher utilization of processing elements. In order to achieve this goal, we can change the scheduling order to improve the performance, and use parallel input feature graphs and output channels to improve computing parallelism. However, as neural network computation's demand on hardware performance increases, the array size of processing elements inevitably becomes larger and larger. When the array size is increased to a certain extent, a single parallel approach makes utilization gradually to decrease. This requires hardware to develop more neural network parallelism, thereby suppressing element idling. At the same time, in order to adapt to different network structures, configurable operation of hardware arrays on the neural network is required. But configurable hardware can greatly increase hardware overhead and data scheduling difficulty. So, we propose a configurable neural network accelerator based on tiling dataflow. In order to reduce hardware complexity, we propose a partial configuration technique, which can not only improve the utilization of processing elements under large array, but also reduce hardware overhead as much as possible. When the array size of processing elements exceeds 512, the utilization can maintain at an average of 82%~90%. And the accelerator performance is almost linearly proportional to the number of processing elements.
       
      A hybrid memory trace collection
      and analysis toolkit for DDR4
       
      LI Zuojun1,2,CHEN Mingyu1,2,Qin Xiaoning3
      2019, 41(06): 973-980. doi:
      Abstract ( 200 )   PDF (1572KB) ( 204 )      Review attachment
      With the development of multicore technology, the popularity of big data, cloud computing, artificial intelligence applications, the gradual practicality of nonvolatile memory technology and the urgent need for information security, the design of memory systems as the core part of data processing is extremely important. However, the existing analysis tools for memory systems are not able to meet the need of the researchers due to a variety of defects. We redesign the hardware based on the original HMTT, and realize the complete, efficient and distortionfree memory trace collection on the latest DDR41600 platform, and further enhance the portability of the toolkit based on the original system. Finally, we use the toolkit to perform memory trace collection test on the latest SPEC CPU 2017 applications, and analyze the collected trace information, which further verifies the effectiveness of our work. Our work offers researchers a powerful tool on analyzing the memory access behaviors of multiple applications and designing memory system architectures.
       
      A static and dynamic adaptive optimization
       method based on Java virtual machine
      ZHANG Haijun1,ZHENG Yan2,YE Jun1,BAI Shujing1
      2019, 41(06): 981-986. doi:
      Abstract ( 119 )   PDF (661KB) ( 195 )      Review attachment
      Dynamic language can take advantage of the profiling information at runtime to guide various optimizations of the program. However, the existing JAVA virtual machine does not effectively utilize the information collected at runtime, and directly discards it at the end. It re-monitors and collects the information needed for optimization when the program is executed again. We therefore propose a static and dynamic adaptive optimization method based on HotSpot virtual machine, which saves the optimal parameters or information obtained by the optimized object iterative search at runtime into the resource library. It can learn from the resource library to obtain the best parameters or options suitable for the current program, and effectively use the data accumulated at runtime. Resource analysis is static and offline, and does not take up the overhead for running the application. In the process of iterative learning, the accuracy and efficiency of the resource library learning process are ensured by avoiding redundancy instances to enter the library and removing noise instances from the library. Experiments show that the proposal is practical in guiding the adaptive optimization for Java virtual machine on different platforms.
       
      GA-Sim: A job running time prediction algorithm
      based on categorization and instance learning
      XIAO Yonghao,XU Lunfan,XIONG Min
      2019, 41(06): 987-992. doi:
      Abstract ( 177 )   PDF (701KB) ( 225 )      Review attachment

      In high performance computing job scheduling, many scheduling algorithms, especially the backfilling algorithm such as EASY, depend on the accurate estimation of job running time. Using user-supplied job running time usually significantly reduce scheduling performance. We propose a job running time prediction algorithm based on categorization and instance learning, named GA-Sim. It considers both prediction accuracy and underestimation problem. Numerical experiments on two actual scheduling logs show that compared with the IRPA and TRIP, the GASim reduces the underestimation rate while achieving higher prediction accuracy. We also make an indepth analysis of the numerical experiment results, and give suggestions for choosing an appropriate prediction algorithm  under different circumstances.

      A  new scalable parallel simulator for
      polymer flooding in naturally fractured reservoirs
      ZHONG He1,LIU Hui1,CUI Tao2,WANG Kun1,HE Ye2,SHEN Lihua1,YANG Bo1,HE Ruijian1,ZHU Zhouyuan3,HU Jinghong4,CHEN Zhangxing1
      2019, 41(06): 993-1000. doi:
      Abstract ( 135 )   PDF (1175KB) ( 230 )      Review attachment

      As one of the main energy resources, oil plays an important role in transportation, industrial production and daily life. A large number of technologies have been developed to maximize oil production. For instance, polymer flooding has been widely applied in China. The simulation of polymer flooding in naturally fractured reservoirs is vital to sustainable oil production and extension of the life of oil fields. We develop a scalable parallel reservoir simulator to simulate real oil production in naturally fractured reservoirs using polymer flooding technology, with which reservoir engineers can study production technologies using powerful parallel computers and optimize oil production. Comparison with commercial software shows that our simulator is correct and efficient. Besides, numerical experiments demonstrate that our simulator is scalable and can calculate large-scale reservoir models with hundreds of millions of grid blocks by using thousands of CPU cores.

      CoFM: A hardware/software co-processing flow
      management mechanism for middlebox acceleration
      SHUI Xiaoyu,LI Junnan,SUN Zhigang,HUANG Jinfeng
      2019, 41(06): 1001-1008. doi:
      Abstract ( 140 )   PDF (706KB) ( 149 )     
      Flow management is the basis for Middlebox to implement stateful message processing. It is responsible for managing the connection established by endtoend communication and the management of its status. In order to meet the diverse needs of different Middleboxes for flow management, flow management functions are usually implemented in software such as Broand mOS to ensure flexibility. However, due to limited capability of CPUs, software flow management has the disadvantages of low throughput and high processing delay. 
      To achieve both performance and flexibility, we propose a software/hardware coprocessing flow management mechanism, called CoFM,  to accelerate Middleboxes. The basic idea of the CoFM is decoupling flow management into applicationindependent mapping management (that is, finding connections according to flow identifiers) and applicationrelated connection management. And the mapping management can be offloaded to hardware to reduce the number of memory access without affecting the flexibility of connection management. Furthermore, the mapping management supports dynamic insertion of new mappings and deletion of outtime mappings, which can reduce the number of hardware/software interactions and hardware resource consumption, respectively. Finally, we implement the mapping management on FPGA using Verilog language. The results show that the mapping management of the CoFM has a throughput of 50 Gbps with low processing latency (<1 μs), and needs only a relatively small hardware resource cost, which is suitable to accelerate flow management.
       
      Design and optimization of an N-state
      binary consensus algorithm
      LIU Hua1,YANG Chunxi1,HAN Guangsong2,XIE Kexin1
      2019, 41(06): 1009-1015. doi:
      Abstract ( 138 )   PDF (828KB) ( 150 )     
      Given the disadvantages of poor expansibility and strong experience dependence of existing binary consensus algorithms, we propose an Nstate distributed binary consensus algorithm. Firstly, based on the idea of average consensus of the Gossip algorithm and the roulette idea, the degree of deviation between the state average of the wireless sensor network and the current state average is updated, and the initial probabilistic distribution of all possible update states is calculated. Secondly, the genetic algorithm is applied to optimize the initial probability distribution and obtain the optimal probability distribution with better accuracy. The results show that the proposed algorithm has higher accuracy and shorter convergence time under the same number of states.
       
      A formal engineering method for requirement
      modeling of airborne embedded control software
      HUANG Yihao,FENG Jincao,ZHENG Hanyue,MIAO Weikai,PU Geguang
      2019, 41(06): 1016-1025. doi:
      Abstract ( 188 )   PDF (860KB) ( 225 )     
      Embedded control software is one of the core components of modern aerocrafts. Constructing formal requirement specification to accurately describe the intended functions and operational scenarios is recognized as a fundamental way to ensure the quality of such safety-critical software. However, industrial practitioners still face a big challenge of applying formal modeling in largescale software projects due to the lack of a systematic engineering approach. Such an approach should provide practical technologies for guiding practitioners to construct formal specifications from the scratch and validate whether the specifications conform to the intended functions and scenarios. To tackle this challenge, we propose a formal engineering method called aero control software description language based modeling and validation (ACSDL-MV) for the requirement modeling of airborne control software. The ACSDL-MV incorporates formal methods and traditional software engineering technologies by guiding practitioners to build formal specifications from the original requirement document in an evolutionary process. For building formal specifications, we design a formal description language of airborne control software, which is named aero control software description language (ACSDL). In order to confirm that the requirement specifications accurately and sufficiently describe software capabilities expected by people, we provide static review based on diagram and dynamic simulation based on formal model. Experimental results show that the approach can detect more potential errors than traditional review methods.

       

       
      A hazard analysis method based
      on failure propagation model
      GE Xiaoyu1,SHEN Guohua1,2,HUANG Zhiqiu1,2,DENG Liumeng1,WAN Weijian1
      2019, 41(06): 1026-1033. doi:
      Abstract ( 160 )   PDF (680KB) ( 239 )     
      Embedded realtime systems are extensively used in safetycritical environments, such as transportation, aerospace  and nuclear power systems. Although system design may not have any defects, random failures due to wear of physical components or sudden changes in the environment can  cause system hazards during operation. Currently, the hazard analysis methods based on failure propagation model either only consider failure propagation time or just failure probability, and do not comprehensively analyze the impact of the failure propagation time and the failure probability on the hazard analysis. Timed failure propagation graphs (TFPGs) are usually used to model the failure propagation process in the design phase of a safetycritical system, which includes failure propagation delay modeling. Considering the effect of the uncertainty of failure propagation path on the probability of the hazard occurrence, we propose a hazard analysis method, which uses the probabilisticTFPGs to model the failure propagation process. We also design an analysis algorithm to obtain the correlation between occurrence time and occurrence probability. Finally, a case is given to demonstrate the feasibility of the proposed approach.
       
      A monic execution sequence generation
      method for web service testing
      HE Juanjuan1,LIU Dongmei1,ZHU Hong2,DU Yining1,ZHOU Zijian1,ZHENG Xiaoyu1
      2019, 41(06): 1034-1043. doi:
      Abstract ( 121 )   PDF (897KB) ( 198 )     
      Automatic test case generation plays a key role in automatic testing for web services. Existing testing techniques based on algebraic specifications rely on the creation, initialization, and replication of objects under test to verify the correctness of test results. However, their thirdparty nature means that such operations are not usually available, and thus it is hard to support transforming test cases into executable sequences of operations. One possible solution is to convert the test case into a linear execution sequence that contains only one instance of the service under test, does not include instance initialization, and only modifies and checks the instance states.  We improve the original work and propose to use the test execution graph with inverse terms (TEGI) to describe state changes during the execution of the test case. We design a TEGI construction algorithm and a monic execution sequence generation algorithm, and implement prototype tools.  A case study demonstrates that the proposal is feasible for automatic test case generation and can improve the testability of web services.
       
      A  weight guided filtering matching
      algorithm combining three measures
      GUO Xin1,2,WANG Yanjie1,FU Donghui1,2,FAN Bo1,2
      2019, 41(06): 1044-1049. doi:
      Abstract ( 169 )   PDF (561KB) ( 197 )     
      The census transform in existing stereo matching algorithms has good effect in the weak texture region, but it neglects the gray level information of the image, causing unsatisfactory matching effect in repeated texture regions.  We therefore propose an improved census transform. In the initial matching cost stage, we design a similarity measure algorithm based on census transform, mutual information and gradient information. Then in the cost aggregation stage, we adopt an adaptive weight guided filtering aggregation strategy. Finally, the final disparity map is obtained by calculating and optimizing disparity. We use the proposed algorithm to test the standard images provided on the Middlebury website on the VS2015 software platform. Experimental results show that it can get an accurate disparity map and the average mismatch rate is 5.29%, which meets the requirement of 3D reconstruction.
       
      A fast object motion direction recognition
      method based on SIFT algorithm
      LI Hongfeng1,WEI Jingtao1,FU Yawei2,WANG Jiatao1
      2019, 41(06): 1050-1056. doi:
      Abstract ( 156 )   PDF (828KB) ( 190 )     
      Due to the limitation of objective conditions, imaging devices can only image at a lower resolution, and cannot distinguish the motion direction. We propose a
      fast object direction recognition method based on the SIFT. Firstly, according to the block matching algorithm, we use the corner feature extraction method to determine the position of the block, which is named as a detection block. The next frame image is traversed with the determined block, the matching block is found and named as the target block, and the motion direction between the check block and the target block is determined. Finally, the SIFT algorithm is used to match the feature points of two pictures. Each picture selects three corresponding feature points to construct a triangle, and the transformation relationship of two corresponding triangles constructs a mathematical model to identify the motion direction. Experimental results show that the motion direction recognition speed of our proposal is increased by 3.3 times compared with the block matching, and the matching accuracy is increased by 2.9 times, achieving higher judgment speed and accuracy. By comparison, the SIFT search algorithm has stronger feasibility and can determine the moving direction of objects faster in multi-frame images.
       
      Face recognition by support vector machine
      optimized by an improved grey wolf algorithm
      FENG Zhang,PEI Dong,WANG Wei
      2019, 41(06): 1057-1063. doi:
      Abstract ( 168 )   PDF (784KB) ( 221 )     
      In order to solve the low efficiency problem of the combined twodimensional principal component analysis (2DPCA) and principal component analysis (PCA) in extracting face features, we propose a face recognition method based on the support vector machine optimized by the 2DPCA and fast PCA combined with an improved gray wolf algorithm (EGWO). Concerning the feature extraction, the method combines the 2DPCA with fast PCA to reduce the dimension and extraction time of the extracted features, thus reducing the identification time required by the SVM. In order to improve the global search capability of the gray wolf algorithm, the elite opposite-based learning strategy is used to initialize population individuals, which effectively enhances GWO’s exploration and mining capabilities. And then the elite opposite-based learning strategy is used in the SVM to iteratively obtain the best kernel parameters and disciplinary parameters, and the final classifier obtained from training is applied in face recognition. The GWO and opposite-based learning grey wolf algorithm (OGWO) are compared on six benchmark test functions in convergence accuracy and speed, and the former outperforms the latter. Experiments on face images in ORL and Yale datasets show that the improved algorithm is better and more stable than the GWO, particle swarm optimization (PSO) and differential evolution (DE) algorithm combined with SVM model.
       
       
      A new automatic summarization method
      based on paragraph vector
      SHEN Qiangqiang,XIONG Zeyu,XIONG Yueshan
      2019, 41(06): 1064-1070. doi:
      Abstract ( 124 )   PDF (1107KB) ( 184 )     
      Automatic text summarization technology has a very broad application prospect in many fields, such as web search and browsing recommendation. The classic text summarization algorithm uses statistical methods to extract article keywords and topic sentences. It ignores semantic and grammatical information of the text to some extent. As distributed word vector embedding technology has been widely used in text summarization in recent years, we propose an automatic text summarization method based on word vector generation. This method mainly includes four modules: word vector generation, paragraph vector generation based on word vector, keyword extraction, and topic sentence extraction, through which an automatic text summarization of the document can finally be achieved. Experimental results show that the improved automatic text summarization method can extract topic sentences effectively.
       
      A local similarity prediction recommendation
      model based on improved CNN
       
      WU Guodong1,2,SONG Fugen1,TU Lijing2,SHI Mingzhe2
      2019, 41(06): 1071-1077. doi:
      Abstract ( 27 )   PDF (734KB) ( 78 )     
      In order to alleviate the problem of data sparsity in recommendation systems, based on the advantage of convolutional neural networks (CNNs) in capturing local features, we propose a local similarity prediction recommendation model based on improved CNN (LSPCNN) by adding an adjustment layer. The new model iteratively adjusts the initial user-item scoring matrix to localize user’s interest preference. And then CNNs are integrated to predict the missing score and achieve personalized recommendation. Experimental results show that the MAE value of the LSPCNN model under different degrees of data sparsity decreases by an average of 4% compared with the traditional recommendation methods, and it effectively alleviates the data sparsity problem and improves the performance of recommendation systems.
       
       
       
      An improved ant colony algorithm for multi-agent
      path planning in dynamic environments
       
      ZHENG Yanbin1,2,WANG Linlin1,XI Pengxue1,FAN Wenxin1,HAN Mengyun1
      2019, 41(06): 1078-1085. doi:
      Abstract ( 161 )   PDF (1001KB) ( 205 )     
      Aiming at the problem of multi-agent path planning in dynamic environments, we propose an improved dynamic path planning method by combining the ant colony algorithm and fireworks algorithm. This method accelerates the iteration speed of the algorithm by adapting the pheromone intensity value and the pheromone reduction factor, and uses the fireworks algorithm to solve the deadlock problem in the path planning process and avoid falling into a local optimum. In the process of multi-agent dynamic collision avoidance, corresponding collision avoidance strategies are made according to whether the motion trajectory between dynamic obstacles and multiagent intersects, and the path collision function is used to solve the multi-agent frontal collision problem. Simulation results show that the proposed algorithm is superior to the traditional ant colony algorithm. It can effectively solve the collision problem in multiagent path planning, and quickly find the optimal collision-free path.
       
      An imbalanced data classification method
      based on probability threshold Bagging
      ZHANG Zhonglin,WU Dangping
      2019, 41(06): 1086-1094. doi:
      Abstract ( 196 )   PDF (962KB) ( 198 )     

      The category imbalance problem exists widely in real life. Most of the traditional classifiers assume balanced class distribution or equal misclassification cost. However, when dealing with unbalanced data, their classification performance is seriously affected. Aiming at the classification problem of imbalanced data sets, we propose a probability threshold Bagging classification algorithm, called PT-Bagging to deal with unbalanced data. The algorithm combines the threshold-moving technique with the bagging ensemble algorithm, uses the original distributed training set for training in the training phase, introduces a decision threshold-moving method in the prediction phase, and employs the calibrated posterior probability estimation to obtain the maximized average performance measurement of the imbalanced data classification. Experimental results show that the PT-Bagging algorithm can better classify imbalanced data.

      A spectral clustering algorithm based on Canopy clustering
      ZHOU Wei,XIAO Yang
      2019, 41(06): 1095-1100. doi:
      Abstract ( 136 )   PDF (710KB) ( 185 )     
      The traditional spectral clustering algorithm is sensitive to initialization. Aiming at this defect, we introduce the canopy algorithm to conduct coarse cluster and get the initial clustering center as the input of the K-Means algorithm. Then we propose a spectral clustering algorithm based on canopy clustering (CanopySC) to reduce the blind selection of the initial center of the traditional spectral clustering algorithm. We apply the new algorithm to face image clustering. Compared with the traditional spectral clustering algorithm, the Canopy-SC algorithm can not only get better clustering centers and results, but also has a higher clustering accuracy. Experiments demonstrate its effectiveness and feasibility.
      Key words:
      A collaborative filtering recommendation algorithm
      based on a bee colony K-means clustering model
      LI Yanjuan,NIU Mengting,LI Linhui
      2019, 41(06): 1101-1109. doi:
      Abstract ( 194 )   PDF (813KB) ( 212 )      Review attachment
      To address the problem of low recommendation quality and low recommendation efficiency of current collaborative filtering recommendation algorithms, we propose a collaborative filtering recommendation algorithm based on an improved bee colony K-means clustering model. Firstly, based on user attribute information, the algorithm uses the improved bee colony K-means algorithm to cluster users and establish a user clustering model. Secondly, we calculate the distance between target users and the clustering center in the user clustering model, and the cluster with the minimal distance is taken as the retrieval space of active users. Finally, we search the nearest neighbor of the target user by the similarity calculation according to the useritem scoring matrix, and generate a recommendation list via the information of the nearest neighbor. Experimental results show that the proposed algorithm can achieve lower MAE and shorter running time, and it can enhance the quality and efficiency of recommendation.