基于未知数据源的数据信息抽取研究

J4 ›› 2008, Vol. 30 ›› Issue (7): 114-117.

基于未知数据源的数据信息抽取研究

陈卓君陈军华

出版日期:2008-07-01 发布日期:2010-05-22

Online:2008-07-01 Published:2010-05-22

摘要/Abstract

摘要：

本文介绍利用分装器和簇技术，在没有人工干涉和不知道其数据源的情况下，对含有标记的网页进行分割和查找我们所关心的数据段，最后借用匹配索引技术来抽取感兴趣的数据，并存入到数据库中去。通过对二次搜索和二级数据挖掘的研究，我们可以在不知道数据源的情况下对数据搜索和抽取，从而提供个性化的信息。

关键词: 簇数据挖掘信息抽取

Abstract:

This paper introduees the technology of wrapper and clustering. When we do not know their data sources, we will segment the web pages containing tags and search the data section we care without artificial interference. Finally we take advantage of matching and indexing in order to extract information and put them into databases. By studying the second search and data mining, we can search and extract data so as to offer individualized information wit hout knowing the data source.

Key words: clustering, data mining, information extraction

陈卓君陈军华. 基于未知数据源的数据信息抽取研究[J]. J4, 2008, 30(7): 114-117.

基于未知数据源的数据信息抽取研究

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 0

编辑推荐

Metrics

本文评价