• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (03): 545-550.

• 论文 • 上一篇    下一篇

微博中基于增强型倒排索引的特定文档影响力估计算法

司宏伟   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2012-07-14 修回日期:2012-10-12 出版日期:2014-03-25 发布日期:2014-03-25
  • 基金资助:

    国家863计划资助项目(2011AA010702,2012AA01A402);国家自然科学基金资助项目(91124002);科技支撑计划课题(2012BAH38B06)

Estimating the influence of documents:
An enhanced inverted index based approach         

SI Hongwei   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2012-07-14 Revised:2012-10-12 Online:2014-03-25 Published:2014-03-25

摘要:

微博搜索系统中,将微博帖子根据搜索相关性和重要性进行排序,并通过列表的方式返回结果,是目前信息内容的主要展示手段。基于向量空间模型的打分函数被广泛地应用于该类系统中。事实上,微博系统中的帖子重要性打分函数实际取值并不为用户所见,文档的影响力通过排名的方式表现出来。对于一个检索外的文档,如何衡量其在信息检索系统文库中的影响力?一般搜索引擎或信息检索系统并不能很好地回答该问题。在微博短文本的基础上引入了社交影响力这一概念,并通过在文本倒排索引基础上设置反向位置标记,给出了一种全新的影响力度量指标,有效地回答了前述问题。理论分析和数据实验验证了算法的有效性和效率。

关键词: 信息获取, 倒排索引, TFIDF指标, 索引标记

Abstract:

Ranking the documents in a list has been extensively used in a lot of search engine systems. In these systems, vector space based ranking models are adopted. Actually, the ranking score of a given document is invisible to search engine users, and the rank position can be regarded as a measure of the influence of a given document. However, for a document outside corpus, how can we measure the influence of it? The question cannot be answered by using ordinary search engines. Social influence is introduced on a real microblogging system. Moreover, a large number of milestones are added into inverted indices for the sake of estimating the influence scores. Therefore, above questions can be well answered. The experiments on real data sets verify the effectiveness and efficiency.

Key words: information retrieval;inverted index;TFIDF;milestone