• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (04): 765-771.

• 论文 • 上一篇    下一篇

基于网页概率潜在语义信息的用户兴趣聚类

钱雪忠,吴志媛   

  1. (江南大学物联网工程学院,江苏 无锡 214122)
  • 收稿日期:2012-09-24 修回日期:2013-03-29 出版日期:2014-04-25 发布日期:2014-04-25
  • 基金资助:

    国家自然科学基金资助项目(61103129);江苏省科技支撑计划资助项目(BE2009009)

User’s interest clustering based on
webpage probabilistic latent semantic information         

QIAN Xuezhong,WU Zhiyuan   

  1. (School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China)
  • Received:2012-09-24 Revised:2013-03-29 Online:2014-04-25 Published:2014-04-25

摘要:

为了能准确挖掘用户兴趣点,首先利用概率潜在语义分析PLSA模型将“网页词”矩阵向量投影到概率潜在语义向量空间,并提出“自动相似度阈值选择”方法得到网页间的相似度阈值,最后提出将平面划分法与凝聚式层次聚类相结合的凝聚式层次k中心点HAKmedoids算法,实现用户兴趣点聚类。实验结果表明,与传统的基于划分的算法相比,HAKmedoids算法聚类效果更好。同时,提出的用户兴趣点聚类技术在个性化服务领域可提高个性化推荐和搜索的效率。关键词:

关键词: 概率潜在语义分析, 自动相似度阈值选择, 用户兴趣点, 凝聚式层次k中心点, 个性化服务

Abstract:

To mine user’s interests accurately,probabilistic latent semantic analysis (PLSA) model is firstly used to project webpage-word matrix vector into probabilistic latent semantic vector space. A method of “auto-selected similarity threshold” is proposed to get web pages similarity threshold. At last, combined with divisiory algorithms and hierarchical agglomerative clustering,a hierarchical agglomerative kmedoids clustering algorithm is proposed to realize cluster user’s interests. The experimental results show that, compared with the traditional divisiory algorithms, the hierarchical agglomerative kmedoids algorithm has a better clustering effect. Furthermore, user’s interest clustering technique can improve the efficiency of personalized recommendation and search in user’ personalized service fields.

Key words: probabilistic latent semantic analysis;autoselected similarity threshold;user’s interest points;hierarchical agglomerative kmedoids;personalized service