• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2016, Vol. 38 ›› Issue (01): 188-194.

• 论文 • 上一篇    

基于多标记学习的汽车评论文本多性能识别

张晶,李德玉,王素格   

  1. (山西大学计算机与信息技术学院,山西 太原 030006)
  • 收稿日期:2015-10-08 修回日期:2015-12-06 出版日期:2016-01-25 发布日期:2016-01-25
  • 基金资助:

    国家自然科学基金(61272095,61175067);山西省科技攻关项目(2011032102702);山西省回国留学人员科研项目(2013014);山西省科技基础条件平台建设项目(20150910010102)

Multiple performances identification for car
review texts based on multilabel learning 

ZHANG Jing,LI Deyu,WANG Suge   

  1. (School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China)
  • Received:2015-10-08 Revised:2015-12-06 Online:2016-01-25 Published:2016-01-25

摘要:

针对汽车产品评论文本中出现的多方面性能,提出一种基于多标记学习的汽车评论文本多方面性能识别方法。首先,结合文本挖掘方法,利用多标记文本特征选择方法选取特征,将非结构化的文本转化为结构化的多标记数据集。在此基础上,使用四种多标记分类方法,对待识别的评论文档标注一个或多个方面标记。最后,以八种多标记评价指标评估方面识别的性能。在新浪汽车评论语料上的实验表明,方面识别的子集准确率达到了95%,验证了方法的可行性。

关键词: 多标记学习, 文本处理, 汽车评论, 多方面识别

Abstract:

Aiming at the characteristics of the multiaspect performance appeared in the automotive product reviews,this paper proposed a novel method for recognizing the multiple aspects of performance about car comment text based on multilabel learning.Firstly,appropriate words were selected as features by multilabel text feature selection method combined with the text mining technology,and then,the unstructured document corpus are transformed into structured multilabel dataset.After that,we finished marking one or more aspect tags for the unrecognized comment text with four multilabel classification methods.Finally,the recognition accuracy of multiple aspects was assessed by eight multilabel evaluation metrics.On the Sina car review corpus,experimental results indicate the subset accuracy reaches up to 95%.Hence,our method was feasible for recognizing the multiple aspects of automobile reviews.

Key words: multilabel learning, text processing, car reviews, multiaspect recognition