• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

分段卷积神经网络在文本情感分析中的应用

杜昌顺,黄磊   

  1. (北京交通大学经济管理学院,北京 100044)
  • 收稿日期:2016-05-06 修回日期:2016-07-01 出版日期:2017-01-25 发布日期:2017-01-25

Sentiment analysis with piecewise convolution neural network

DU Changshun,HUANG Lei   

  1. (School of Economics and Management,Beijing Jiaotong University,Beijing 100044,China)
  • Received:2016-05-06 Revised:2016-07-01 Online:2017-01-25 Published:2017-01-25

摘要:

文本情感分析是当前网络舆情分析、产品评价、数据挖掘等领域的重要任务。由于当前网络数据的急剧增长,依靠人工设计特征或者传统的自然语言处理语法分析工具等进行分析,不但准确率不高而且费时费力。
而传统的卷积神经网络模型均未考虑句子的结构信息,并且在训练时很容易发生过拟合。针对这两方面的不足,使用基于深度学习的卷积神经网络模型分析文本的情感倾向,采用分段池化的策略将句子结构考虑进来,分段提取句子不同结构的主要特征;并且引入Dropout算法以避免模型的过拟合和提升泛化能力。实验结果表明,分段池化策略和Dropout算法均有助于提升模型的性能,所提方法在中文酒店评价数据集上达到了91%的分类准确率,在斯坦福英文情感树库数据集五分类任务上达到了45.9%的准确率,较基线模型都有显著的提升。
 

关键词: 情感分析, 深度学习, 卷积神经网络, 分段池化, Dropout算法

Abstract:

Text sentiment analysis is an important task in the field of network public opinion analysis, product evaluation and data mining. With the growth of data volume, the traditional methods such as manual engineering and NLP tools cannot handle the task due to their low accuracy and high costs. Therefore, we propose a deep learning method named convolution neural network (CNN) to deal with it. The traditional CNN does not consider the structural information of sentences and suffers from overfitting. Aiming at the two problems, we first design a piecewise convolution neural network (PCNN) to combine the structural features, in which the feature vector of a sentence is divided into several segments and does the maxpooling for each of them. Then we introduce the Dropout algorithm to prevent the model from overfitting and extend its generalization abilities. We use two datasets in our experiments: Chinese hotel reviews and the Stanford Sentiment TreeBank. Experimental results on the two datasets show that both the PCNN and the Dropout can enhance the performance. The proposed model can achieve 91% accuracy on the Chinese dataset and 45.9% on the English dataset, which are higher than all of the baseline systems.

Key words: sentiment analysis, deep learning, piecewise convolution neural network, piecewisepooling, Dropout algorithm