寻找相似样本的小样本半监督学习

秦飞，杨燕

doi:10.3969/j.issn.1007130X.2010.

计算机工程与科学 >

2010 , Vol. 32 >Issue 9: 127 - 129

DOI: https://doi.org/10.3969/j.issn.1007130X.2010.

论文

寻找相似样本的小样本半监督学习

展开

（西南交通大学信息科学与技术学院，四川成都 610031）

秦飞(1985),男,山东肥城人，硕士，研究方向为数据挖掘；杨燕,博士,教授,研究方向为计算智能和数据挖掘。

收稿日期: 2010-03-13

修回日期: 2010-06-10

网络出版日期: 2010-09-02

收起

Small Sample and SemiSupervized Learningfor Finding Similar Samples

Expand

（School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610031,China）

Received date: 2010-03-13

Revised date: 2010-06-10

Online published: 2010-09-02

Fold

摘要

传统的文本分类方法需要大量的已知类别样本来得到一个好的文本分类器，然而在现实的文本分类应用过程中，大量的已知类别样本通常很难获得，因此如何利用少量的已知类别样本和大量的未知类别样本来获得比较好的分类效果成为一个热门的研究课题。本文为此提出了一种扩大已知类别样本集的新方法，该方法先从已知类别样本集中提取出每个类别的代表特征，然后根据代表特征从未知类别样本集中寻找相似样本加入已知类别样本集。实验证明，该方法能有效地提高分类效果。

本文引用格式

秦飞，杨燕 . 寻找相似样本的小样本半监督学习[J]. 计算机工程与科学, 2010 , 32(9) : 127 -129 . DOI: 10.3969/j.issn.1007130X.2010.

Abstract

Traditional approach for building text classifiers requires a large number of labeled documents for training a good text classifier. For reallife text classification applications, it is difficult to obtain a large number of labeled documents, so how to get a better result with these labeled and unlabeled documents has become a hot research topic. This paper proposes a new method to expand the set of labeled documents, first we extract a set of representative features from the labeled documents, then according to these representative features we choose the similar samples and add them to the labeled documents. The experiments prove that the method can effectively improve the classification results.

Key words： text classification;representative features;similar sample

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract