一种基于语义分析的主题爬虫算法
收稿日期: 2010-03-12
修回日期: 2009-06-17
网络出版日期: 2010-09-02
A Topic Crawler AlgorithmBased on Semantic Analysis
Received date: 2010-03-12
Revised date: 2009-06-17
Online published: 2010-09-02
蒋宗礼,田晓燕,赵旭 . 一种基于语义分析的主题爬虫算法[J]. 计算机工程与科学, 2010 , 32(9) : 145 -147 . DOI: topic crawler;subspace;semanti
Massive web and its rapid growth make it difficult for generalpurpose search engines to provide satisfactory results for the theme or areaoriented queries. This paper studies the subject of gathering information relevant to the subject, to significantly reduce the amount of web pages dealing. By assessing the degree of Web pages, it gives priority to the crawling pages related to a higher degree. Using a subspacebased semantic analysis technique, combined with the Bayesian mechanism and support vector machine, we design and implement an efficient topic crawler. Experiments show that our algorithm has good accuracy and efficiency.
/
| 〈 |
|
〉 |