• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (07): 1389-1397.

• 论文 • 上一篇    下一篇

非负矩阵分解在微阵列数据分类和聚类发现中的应用

任重鲁,李金明   

  1. (南方医科大学基础医学院生物信息学系,广东 广州 510515)
  • 收稿日期:2012-11-15 修回日期:2013-04-12 出版日期:2014-07-25 发布日期:2014-07-25
  • 基金资助:

    广东省高校人才引进专项基金资助项目(2011)

Application of nonnegative matrix factorization in
microarray data classification and clustering discovery     

REN Zhonglu,LI Jinming   

  1. (Department of Bioinformatics,School of Basic Medical Sciences,Southern Medical University,Guangzhou 510515,China)
  • Received:2012-11-15 Revised:2013-04-12 Online:2014-07-25 Published:2014-07-25

摘要:

基因芯片是微阵列技术的典型代表,它具有高通量的特性和同时检测全部基因组基因表达水平的能力。应用微阵列芯片的一个主要目的是基因表达模式的发现,即在基因组水平发现功能相似,生物学过程相关的基因簇;或者将样本分类,发现样本的各种亚型。例如根据基因表达水平对癌症样本进行分类,发现疾病的分子亚型。非负矩阵分解NMF方法是一种非监督的、非正交的、基于局部表示的矩阵分解方法。近年来这种方法被越来越多地应用在微阵列数据的分类分析和聚类发现中。系统地介绍了非负矩阵分解的原理、算法和应用,分解结果的生物学解释,分类结果的质量评估和基于NMF算法的分类软件。总结并评估了NMF方法在微阵列数据分类和聚类发现应用中的表现。

关键词: 非负矩阵分解, 微阵列数据, 分类分析, 聚类发现

Abstract:

A typical representation of microarray technologies is DNA microarray, which has ability to simultaneously measure the expression levels of all genes in genome due to its property of highthroughput. One of the main objectives of microarrays assay is gene expression pattern discovery, that is, not only the discovery of gene clusters where genes have similar functions or relative biological process, but also the discovery of sample subtypes which possess the intrinsic features, such as cancer subtypes. Nonnegative matrix factorization is an unsupervised, nonorthogonal, localbased representation methodology used into microarrays data analysis, especially in classification analysis and clustering discovery. The typical algorithm and some improved algorithms of NMF are introduced, and the biological annotation of factorization, the assessment of classification outcomes and the existing implementations basedon NMF are systematically summarized. Finally, the performance of NMF in recent microarray experiments is given.

Key words: non-negative matrix factorization;microarray data;classification analysis;clustering discovery