基于深度学习的中文文本分类综述

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (04): 684-692.

基于深度学习的中文文本分类综述

高珊，李世杰,蔡志平

(国防科技大学计算机学院，湖南长沙 410073)

收稿日期:2023-09-06 修回日期:2023-10-27 接受日期:2024-04-25 出版日期:2024-04-25 发布日期:2024-04-18

A survey of Chinese text classification based on deep learning

GAO Shan,LI Shi-jie,CAI Zhi-ping

(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

Received:2023-09-06 Revised:2023-10-27 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

摘要/Abstract

摘要： 大数据时代，随着社交媒体的不断普及，在网络以及生活中，各类文本数据日益增长，采用文本分类技术对文本数据进行分析和管理具有重要的意义。文本分类是自然语言处理领域中的一个基础研究内容，在给定标准下，根据内容对文本进行分类，文本分类的场景应用十分广泛，如情感分析、话题分类和关系分类等。深度学习是机器学习中一种基于对数据进行表征学习的方法，在文本数据处理中表现出了较好的分类效果。中文文本与英文文本在形、音、象上都有着区别，着眼于中文文本分类的特别之处，对用于中文文本分类的深度学习方法进行分析与阐述，最终梳理出常用于中文文本分类的数据集。

关键词: 中文文本分类, 自然语言, 深度学习, 机器学习

Abstract: In the era of big data, with the continuous popularization of social media, various text data are growing in the network and in life. It is of great significance to analyze and manage text data using text classification technology. Text classification is a basic research field in the field of artificial intelligence natural language processing. Under the given criteria, it classifies text according to content. The application scenarios of text classification are very extensive, such as sentiment analysis, topic classification, relationship classification, etc. Deep learning is a method of representation learning based on data in machine learning, and it shows good classification effect in text data processing. Chinese text and English text have differences in form, sound, and image. Focusing on the uniqueness of Chinese text classification, this paper analyzes and expounds the deep learning methods used for Chinese text classification, and finally sorts out commonly used datasets for Chinese text classification.

Key words: Chinese text classification, natural language, deep learning, machine learning

高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.

GAO Shan, LI Shi-jie, CAI Zhi-ping. A survey of Chinese text classification based on deep learning[J]. Computer Engineering & Science, 2024, 46(04): 684-692.

[1]	吴玉虹, 王建. 基于Patches-CNN的模拟电路故障诊断[J]. 计算机工程与科学, 2025, 47(01): 35-44.
[2]	徐超, 阮荣耀, 陈勇, . 一种基于区块链的医疗数据审计方法[J]. 计算机工程与科学, 2025, 47(01): 95-106.
[3]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(01): 150-159.
[4]	罗婧, 叶志晟, 杨泽华, 傅天豪, 魏雄, 汪小林, 罗英伟, . 研发类GPU集群任务数据集的构建及分析[J]. 计算机工程与科学, 2024, 46(12): 2128-2137.
[5]	敬超, 闭玉申. 面向深度学习作业的干扰感知在线调度算法研究[J]. 计算机工程与科学, 2024, 46(12): 2138-2148.
[6]	陈磊, 梁正友, 孙宇, 蔡俊民. 多尺度特征融合的移动端单目深度估计研究[J]. 计算机工程与科学, 2024, 46(09): 1616-1524.
[7]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(08): 1395-1402.
[8]	温鑫, 曾焘, 李春波, 徐子晨. 面向服务器无感计算的模型推理服务切换方法研究[J]. 计算机工程与科学, 2024, 46(07): 1210-1217.
[9]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[10]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(06): 1063-1071.
[11]	谭郁松, 王伟, 蹇松雷, 易超雄. 基于异常保持的弱监督学习网络入侵检测模型[J]. 计算机工程与科学, 2024, 46(05): 801-809.
[12]	余天赐, 高尚. 融合多结构信息的代码注释生成模型[J]. 计算机工程与科学, 2024, 46(04): 667-675.
[13]	罗月童, 李超, 周波, 张延孔. 面向工业缺陷分类的交互式易混淆缺陷分离方法研究[J]. 计算机工程与科学, 2024, 46(03): 463-470.
[14]	吕伏, 韩晓天, 冯永安, 项梁. 基于自适应纹理特征融合的纹理图像分类方法[J]. 计算机工程与科学, 2024, 46(03): 488-498.
[15]	吉旭瑞, 魏德健, 张俊忠, 张帅, 曹慧. 中文电子病历信息提取方法研究综述[J]. 计算机工程与科学, 2024, 46(02): 325-337.