• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (02): 359-366.

• 论文 • 上一篇    下一篇

基于CRF的中文评论有效性挖掘产品特征

吕品1,2,3,钟珞1,蔡敦波2,3,吴云韬2,3   

  1. (1.武汉理工大学计算机科学与技术学院,湖北 武汉 430070;2.武汉工程大学计算机科学与工程学院,湖北 武汉 430073;
    3.武汉工程大学智能机器人湖北省重点实验室,湖北 武汉 430073)
  • 收稿日期:2012-09-28 修回日期:2013-02-02 出版日期:2014-02-25 发布日期:2014-02-27
  • 基金资助:

    国家自然科学基金青年基金资助项目(61103136);湖北省高等学校优秀中青年科技创新团队计划项目(T201206);湖北省智能机器人重点实验室开放基金资助项目(200906)

Effective mining product features from Chinese review based on CRF

LV Pin1,2,3,ZHONG Luo1,CAI Dunbo2,3,WU Yuntao2,3   

  1. (1.College of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070;
    2.School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430073;
    3.Hubei Province Key Laboratory of Intelligent Robot,Wuhan Institute of Technology,Wuhan 430073,China)
  • Received:2012-09-28 Revised:2013-02-02 Online:2014-02-25 Published:2014-02-27

摘要:

方面级意见挖掘的任务通常包括从客户评论中抽取产品的特征、与产品特征相关联的观点词识别以及观点的极性判断三个方面。围绕如何实现中文评论的方面级意见挖掘问题,提出了利用条件随机场实现中文评论的方面级意见挖掘的四个主要步骤:数据预处理、训练集准备、为条件随机场模型定义学习函数、应用模型标注新的评论数据。在此基础上,通过以五种实际产品的中文评论语料为数据集,对该方法进行了数据实验。实验结果表明,该方法针对不同类型观点元素的抽取在评估性能指标上大部分达到或超过80%。为了进一步验证所提出方法的有效性,将研究结果进行了差异显著性检验。结果显示,用CRF对中文评论进行方面级意见挖掘和对英文评论的方面意见挖掘的性能差异不大。最后,比较了三种不同方法的方面抽取精度和情感分类精度,实验结果表明,CRF方法优于词典化的隐马尔可夫模型和关联规则挖掘方法。

关键词: 条件随机场;方面级意见挖掘;观点元素

Abstract:

The task of aspectlevel opinion mining usually include the extraction of product entities from consumer reviews, the identification of opinion words that are associated with the entities, and the determination of these opinion’s polarities. Aiming at realizing aspectlevel opinion mining for Chinese reviews, the paper proposes the four major steps: preprocessing; preparing the training set to learn the model; defining learning functions for conditional random field model; and applying the model to label new review data. At the same time, our experiments on the real Chinese reviews of five types of products show that the conditional random field based method can achieve 80% in most of performance indicators of extracted different types of review opinion elements. In order to verify the effectiveness of the proposed method, a test of the significance of difference is involved. Experiments report that there is scarcely difference of performance on conditional random field based method for both Chinese reviews and English reviews. Finally, we compare the precision of aspect extraction and the accuracy of sentiment classification based on three different methods, and the result shows that CRFbased method outperforms the other two such as lexicalized hidden markov model and association rule mining.

Key words: conditional random field, aspectlevel opinion mining, opinion elements