J4 ›› 2014, Vol. 36 ›› Issue (4): 765-771.
• 论文 • 上一篇 下一篇
钱雪忠,吴志媛
收稿日期:
修回日期:
出版日期:
发布日期:
基金资助:
国家自然科学基金资助项目(61103129);江苏省科技支撑计划资助项目(BE2009009)
QIAN Xuezhong,WU Zhiyuan
Received:
Revised:
Online:
Published:
摘要:
为了能准确挖掘用户兴趣点,首先利用概率潜在语义分析PLSA模型将“网页词”矩阵向量投影到概率潜在语义向量空间,并提出“自动相似度阈值选择”方法得到网页间的相似度阈值,最后提出将平面划分法与凝聚式层次聚类相结合的凝聚式层次k中心点HAKmedoids算法,实现用户兴趣点聚类。实验结果表明,与传统的基于划分的算法相比,HAKmedoids算法聚类效果更好。同时,提出的用户兴趣点聚类技术在个性化服务领域可提高个性化推荐和搜索的效率。关键词:
关键词: 概率潜在语义分析, 自动相似度阈值选择, 用户兴趣点, 凝聚式层次k中心点, 个性化服务
Abstract:
To mine user’s interests accurately,probabilistic latent semantic analysis (PLSA) model is firstly used to project webpage-word matrix vector into probabilistic latent semantic vector space. A method of “auto-selected similarity threshold” is proposed to get web pages similarity threshold. At last, combined with divisiory algorithms and hierarchical agglomerative clustering,a hierarchical agglomerative kmedoids clustering algorithm is proposed to realize cluster user’s interests. The experimental results show that, compared with the traditional divisiory algorithms, the hierarchical agglomerative kmedoids algorithm has a better clustering effect. Furthermore, user’s interest clustering technique can improve the efficiency of personalized recommendation and search in user’ personalized service fields.
Key words: probabilistic latent semantic analysis;autoselected similarity threshold;user’s interest points;hierarchical agglomerative kmedoids;personalized service
钱雪忠,吴志媛. 基于网页概率潜在语义信息的用户兴趣聚类[J]. J4, 2014, 36(4): 765-771.
QIAN Xuezhong,WU Zhiyuan. User’s interest clustering based on webpage probabilistic latent semantic information [J]. J4, 2014, 36(4): 765-771.
0 / / 推荐
导出引用管理器 EndNote|Ris|BibTeX
链接本文: http://joces.nudt.edu.cn/CN/
http://joces.nudt.edu.cn/CN/Y2014/V36/I4/765