• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (6): 106-110.

• 论文 • 上一篇    下一篇

信息检索方法在手写数字识别中的应用及改进

孔 旭1,孔琼香2,李一鹏1   

  1. (1.西安交通大学理学院,陕西 西安 710049;2.西安交通大学人居学院,陕西 西安 710049)
  • 收稿日期:2011-03-27 修回日期:2011-06-29 出版日期:2012-06-25 发布日期:2012-06-25
  • 基金资助:

    国家自然科学基金资助项目(NSFC11071192);国家科技部国际合作项目(2010DFA14700);陕西省自然科学基础研究计划(SJ08E226);中央高校基本科研业务费项目(XJJ20100107)

Application of the Information Retrieval Method in Handwritten Digit Recognition and Its Improvement

KONG Xu1,KONG Qiongxiang2,LI Yipeng1   

  1. (1.School of Science,Xi’an Jiaotong University,Xi’an 710049;2.School of Human Settlement and Civil Engineering,Xi’an Jiaotong University,Xi’an 710049,China)
  • Received:2011-03-27 Revised:2011-06-29 Online:2012-06-25 Published:2012-06-25

摘要:

本文首先将文本信息检索中LSI方法的思想和原理应用于手写数字识别问题,把手写数字图像看作空间向量的表示,通过计算未知数字与各训练集之间相关度排序来达到识别的目的,计算量小且有较低的误识率(5.5%);其次,通过对所有09数字的训练样本排列为一个矩阵,并对该矩阵进行奇异值分解,将各训练样本在适当维数的左奇异向量上分别投影,得到了一种低阶表示下的相关度计算方法,该方法在保持原有较低误识率的同时,能极大地压缩原有训练样本数据(压缩掉的数据百分比超过95%);另外,利用了区分不规范样本的思想,获得了更低的误识率(下降到4.5%)。

关键词: 手写数字识别, LSI, 奇异值分解

Abstract:

By using the LSI(Latent Semantic Indexing) method of information retrieval in the handwritten digit classification problem, we obtain the right recognition with small computing cost and low recognition error rate (5.5%) through computing the rank of the similarities of the unknown digit vector with different training sets. Then, by making singular value decomposition on the matrix obtained by putting all the 09 digits training sets together, we propose an improved low order representation method based on the projection on the left singular vectors having suitable dimensions, and the method can greatly reduce the training set data (where the data reduction is more than 95%) and keep the low recognition error rate. Additionally, according to the differences between wellwritten digits and worsewritten digits, we reduce the recognition error rate further (down to 4.5%).

Key words: handwritten digit classification;LSI;singular value decomposition