• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (06): 1052-1059.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于图嵌入与拓扑结构信息的蛋白质复合物识别算法

徐周波,李萍,刘华东,李珍   

  1. (桂林电子科技大学广西可信软件重点实验室,广西 桂林 541004)
  • 收稿日期:2020-02-28 修回日期:2020-06-21 接受日期:2021-06-25 出版日期:2021-06-25 发布日期:2021-06-22
  • 基金资助:
    国家自然科学基金(61762027,U1501252);广西自然科学基金(2017GXNSFAA198172)

A protein complex recognition algorithm based on graph embedding and topological structure information

XU Zhou-bo,LI Ping,LIU Hua-dong,LI Zhen   

  1. (Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)

  • Received:2020-02-28 Revised:2020-06-21 Accepted:2021-06-25 Online:2021-06-25 Published:2021-06-22

摘要: 蛋白质复合物是细胞结构和生化机制的研究基础,如何准确识别蛋白质复合物成为近年来的研究热点。针对传统算法根据结构信息对蛋白质复合物进行搜索存在敏感度和F-measure低的问题,以及现有监督学习算法根据人为构造特征进行蛋白质复合物识别存在特征构造不能较好地反映图的真实信息等不足,提出了graph2vec-SVM识别算法。将蛋白质复合物看作稠密子图并考虑子图模块度大小,利用graph2vec将图信息转换为向量,并进一步采用SVM分类器对蛋白质复合物进行识别,提高了蛋白质复合物识别的敏感度和F-measure。该算法分别与目前流行的4种非监督学习算法(ClusterOne、CMC、HC-PIN和COACH)和3种监督学习算法(SCI-BN、SCI-SVM和RM)进行比较,在精准度、敏感度和F-measure 3项指标上都显示出了良好的性能。

关键词: 蛋白质复合物, gragh2vec, SVM, 蛋白质相互作用网络

Abstract: Protein complex is the basis of cell structure and biochemical mechanism. How to recognize protein complex accurately has become a popular research direction in recent years. Traditional algorithms has low sensitivity and F-measure in searching protein complexes based on structural information, and the artificial construction features can not reflect the real information of the graph when the existing supervised learning algorithms use machine learning algorithms to identify protein complexes. In order to solve the aforementioned problems, a graph2vec SVM recognition algorithm is proposed. In this algorithm, the protein complex is regarded as a dense subgraph, and the modularity of the subgraph is considered. graph2vec technology is used to transform the graph information into vectors, and SVM classifier is used to recognize the protein complex, which improves the sensitivity of protein complex re- cognition and F-measure. Compared with four popular unsupervised learning algorithms (ClusterONE, CMC,HC-PIN and Coach) and three supervised learning algorithms (SCI-BN, SCI-SVM and RM), the algorithm shows good performance in terms of accuracy, sensitivity and F-measure.


Key words: protein complex, gragh2vec, support vector machine, protein-protein interaction network