• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An essential proteins prediction algorithm based on
 participation degree in protein complex and density

MAO Yi-min,LIU Yin-ping   

  1. (School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
  • Received:2018-12-21 Revised:2019-05-02 Online:2019-10-25 Published:2019-10-25

Abstract:

The identification of essential proteins in the protein-protein interaction (PPI) network tends to only focus on the topological characteristics of the nodes, and the PPI data contains high false positive, the neighborhood information of nodes and the influence of complex mining on the recognition of essential proteins are not considered comprehensively by the essential proteins recognition algorithm based on complex information, so the accuracy and specificity of the recognition results are not high. In order to deal with these problems, an essential proteins prediction algorithm based on participation degree in protein complex and density (PEC) is proposed. Firstly, the GO annotation information and the edge aggregation coefficient are used to construct the weighted PPI network to overcome the influence of false positives on the experimental results. Based on the edge weight of protein interaction, the similarity matrix is constructed. The maximum difference between eigenvectors is designed to automatically determine the partition number K. Meanwhile, K initial clustering centers are selected according to the degree of protein nodes in the weighted network. Furthermore, the spectral clustering and the fuzzy C-means (FCM) clustering algorithm are combined to excavate the protein complex, thus improving the clustering accuracy and reduces the data dimension. Secondly, based on the degree of participation in protein complex and the neighborhood subgraph density, the scores of the essential proteins are proposed. The experiment results on DIP and Krogan datasets show that, compared with 10 classic algorithms such as DC, BC, CC, SC, IC, PeC, WDC, LIDC, LBCC and UC, PEC can correctly identify more essential proteins with higher accuracy and specificity.
 

Key words: protein-protein interaction network, spectral clustering algorithm, protein complexes, density;essential proteins