文本多分类中的特征选择研究
收稿日期: 2009-06-02
修回日期: 2009-10-10
网络出版日期: 2010-07-28
基金资助
国家863计划资助项目(2006AA01Z451,2007AA01Z474,2007AA010502);国家自然科学基金资助项目(60873204);NCET060928
Feature Selection for MultiClass Text Categorization
Received date: 2009-06-02
Revised date: 2009-10-10
Online published: 2010-07-28
特征选择是数据挖掘和机器学习等领域内重要的预处理步骤,近年来得到了广泛的关注。文本数据的高维性往往会影响分类等数据挖掘任务的效率,因此特征选择常被作为文本分类过程中的重要组成部分,以达到降维的目的。随着分类技术的快速发展,类别的日益细化,文本的多类分类问题为特征选择方法提出了更多的挑战。本文面向文本多类分类的应用背景,阐述了目前特征选择方法所面临的主要挑战,给出了多分类特征选择方法的主要种类。本文沿着相关研究的发展路线,由易至难,由浅入深,通过对目前多分类特征选择算法的应用情况进行总结,并进行综述评论,最后对全文进行了概括,提出了未来可能的研究方向。
王〓博,贾〓焰,杨树强,韩伟红 . 文本多分类中的特征选择研究[J]. 计算机工程与科学, 2010 , 32(8) : 90 -93 . DOI: 10.3969/j.issn.1007130X.2010.
As an important preprocessing step in data mining and machine learning, feature selection has been gradually developed. The highdimensional characteristics of text data always declines the performance of categorization. Hence, feature selection can be employed as a dimensionreduction measure. With the fast evolution of classification methods and refinement of categories, multiclass text categorization gives rise to more challenges for feature selection. In this paper, we present a survey of the main problems and the stateofart feature selection methods, following the development track. Finally, we conclude the whole paper and give some future directions of research.
/
| 〈 |
|
〉 |