• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2006, Vol. 28 ›› Issue (12): 74-76.

• 论文 • Previous Articles     Next Articles

  

  • Online:2006-12-01 Published:2010-05-20

Abstract:

A prohlem with the algorithms of clustering analysis is that their results are always not statistically tested. An algorithm of clustering analysis wi th randomized statistical testing is developed in this paper. It consists of three parts: calculation of distance measures, randomized testing, and hie  erarchical clustering. In this algorithm the between-sample distance is defined as the 1-p_test value, where the p_test value is calculated from the ran domization procedure for the two samples. If the between-class distance meets with the p_test criterion it will be statistically reasonable to combine t  he two classes into one class. Fourteen distance measures and three methods of hierarchical clustering are given. The algorithm is implemented as the ne twork program with the Java language which is comprised of 6 Java classes and a HTML file. The program can run on Java-enabled Web browsers. This algori  thm is tested with the investigation of rice invertebrate diversity. The criteria for choosing distance measures and the perspective for improving the a lgorithm are disussed.

Key words: cluster analysis, randomized statistical resting, distance measure;algorithm;network implementation