• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (4): 765-771.

• 论文 • Previous Articles     Next Articles

User’s interest clustering based on
webpage probabilistic latent semantic information         

QIAN Xuezhong,WU Zhiyuan   

  1. (School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China)
  • Received:2012-09-24 Revised:2013-03-29 Online:2014-04-25 Published:2014-04-25

Abstract:

To mine user’s interests accurately,probabilistic latent semantic analysis (PLSA) model is firstly used to project webpage-word matrix vector into probabilistic latent semantic vector space. A method of “auto-selected similarity threshold” is proposed to get web pages similarity threshold. At last, combined with divisiory algorithms and hierarchical agglomerative clustering,a hierarchical agglomerative kmedoids clustering algorithm is proposed to realize cluster user’s interests. The experimental results show that, compared with the traditional divisiory algorithms, the hierarchical agglomerative kmedoids algorithm has a better clustering effect. Furthermore, user’s interest clustering technique can improve the efficiency of personalized recommendation and search in user’ personalized service fields.

Key words: probabilistic latent semantic analysis;autoselected similarity threshold;user’s interest points;hierarchical agglomerative kmedoids;personalized service