• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (04): 707-712.

• 人工智能与数据挖掘 • 上一篇    下一篇

ED算法和SNP-index算法计算SNP位点的比较分析——以拟南芥为例

甘秋云   

  1. (福州理工学院应用科学与工程学院,福建 福州 350014)
  • 收稿日期:2020-05-15 修回日期:2020-12-10 接受日期:2022-04-25 出版日期:2022-04-25 发布日期:2022-04-20
  • 基金资助:
    福州理工学院校级科研基金(FTKY21053)

Comparison and analysis of ED algorithm and SNP-index algorithm in calculating SNP sites——Take arabidopsis thaliana for example

GAN Qiu-yun   

  1. (School of Applied Science and Engineering,Fuzhou Institute of Technology,Fuzhou 350014,China)
  • Received:2020-05-15 Revised:2020-12-10 Accepted:2022-04-25 Online:2022-04-25 Published:2022-04-20

摘要: SNP(单核苷酸多态性)是发生在DNA序列上单个核苷酸碱基之间的变异,是生物可遗传变异中最常见的一种变异。ED算法和SNP-index算法是计算SNP位点的2种常用算法。由高通量测序获得拟南芥F2代全基因组测序数据,基于Linux平台对测序数据进行过滤、筛选和比对,通过算法实现结果,比较不同算法检测得到的SNP位点数量和SNP基因型比例。实验结果表明,通过ED算法得到的SNP位点数量更多,分布更广,相对分布密度大于SNP-index算法的,但是2种算法得到的SNP位点数量和SNP基因型比例相近。

关键词: 单核苷酸多态性(SNP), 生物信息, ED算法, SNP-index算法

Abstract: SNP (Single Nucleotide Polymorphism) is the most common variation in biological heritable variation, which occurs between single nucleoside acid-base groups in DNA sequence. ED algorithm and SNP-index algorithm are two commonly used algorithms to calculate SNP sites. The whole genome sequencing data of F2 generation of arabidopsis thaliana are obtained by high-throughput sequencing. The sequencing data are filtered, screened and compared based on Linux platform. The number of SNP sites and the proportion of SNP genotypes detected under different algorithms are compared. The experimental results show that the number of SNP sites obtained by ED algorithm is more and more widely distributed than SNP index algorithm, and the relative distribution density is larger than that of SNP index algorithm, but the number of SNP sites and the proportion of SNP genotypes obtained by the two algorithms are similar.

Key words: single nucleotide polymorphism(SNP), biological information, ED algorithm, SNP-index algorithm