• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (4): 144-149.

• 论文 • 上一篇    下一篇

结合PCM聚类算法的网页排序

刘发升,张菊琴   

  1. (江西理工大学信息工程学院,江西 赣州 341000)
  • 收稿日期:2012-06-08 修回日期:2012-10-08 出版日期:2013-04-25 发布日期:2013-04-25
  • 基金资助:

    江西省教育厅科技资助项目(GJJ11463)

Web page ranking algorithm based on PCM clustering algorithm      

LIU Fasheng,ZHANG Juqin   

  1. (School of information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
  • Received:2012-06-08 Revised:2012-10-08 Online:2013-04-25 Published:2013-04-25

摘要:

针对传统的网页排序算法中容易出现的忽略搜索结果主题相关性和主题漂移的问题,提出了结合PCM聚类算法的网页排序,用来提高搜索结果中网页主题的相关性并减少其主题漂移。首先,通过查询某个主题,运用随机行走(RWM)的方法来计算两个网页之间的对称社会距离(SSD);然后,用SSD和PCM聚类算法对网页进行聚类,得到相关主题的各个社区,通过计算得到各个社区中成员属于该社区的概率表示;最后,根据各社区成员的概率和网页的推荐度对网页进行排序。实验结果表明,与PageRank算法相比,该算法搜索结果中网页主题的相关程度更高;另外,由于是针对某个主题的排序,该算法减少了主题漂移。

关键词: 排序算法, RWM, SSD, PCM聚类算法

Abstract:

The paper proposed a page ranking algorithm based on PCM clustering algorithm in order to solve the problems that the topic relevance of search results are easily ignored and the topics are easily changed in the traditional page sorting algorithms. It improves the topic relevance of the search results and reduces the topic drift. Firstly, by inquiring a theme, random walk method (RWM) is used to calculate the two pages of the symmetrical social distance (SSD) between two web pages. Secondly, SSD and PCM clustering algorithm are used to cluster page and get each community of related topic, and obtain the probability of each member in every community group. Finally, according to the probability and recommended degree of the pages, the web pages are sorted. The experimental results show that, compared with the PageRank algorithm, the proposed page sorting algorithm based on PCM clustering algorithm can obtain a search result with more relevant topic. Because it targets a subject sort, the algorithm reduces the topic drift.     

Key words: ranking algorithm;RWM;SSD;PCM clustering algrithm