• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (04): 684-692.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于深度学习的中文文本分类综述

高珊,李世杰,蔡志平   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2023-09-06 修回日期:2023-10-27 接受日期:2024-04-25 出版日期:2024-04-25 发布日期:2024-04-18

A survey of Chinese text classification based on deep learning

GAO Shan,LI Shi-jie,CAI Zhi-ping   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-09-06 Revised:2023-10-27 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

摘要: 大数据时代,随着社交媒体的不断普及,在网络以及生活中,各类文本数据日益增长,采用文本分类技术对文本数据进行分析和管理具有重要的意义。文本分类是自然语言处理领域中的一个基础研究内容,在给定标准下,根据内容对文本进行分类,文本分类的场景应用十分广泛,如情感分析、话题分类和关系分类等。深度学习是机器学习中一种基于对数据进行表征学习的方法,在文本数据处理中表现出了较好的分类效果。中文文本与英文文本在形、音、象上都有着区别,着眼于中文文本分类的特别之处,对用于中文文本分类的深度学习方法进行分析与阐述,最终梳理出常用于中文文本分类的数据集。

关键词: 中文文本分类, 自然语言, 深度学习, 机器学习

Abstract: In the era of big data, with the continuous popularization of social media, various text data are growing in the network and in life. It is of great significance to analyze and manage text data using text classification technology. Text classification is a basic research field in the field of artificial intelligence natural language processing. Under the given criteria, it classifies text according to content. The application scenarios of text classification are very extensive, such as sentiment analysis, topic classification, relationship classification, etc. Deep learning is a method of representation learning based on data in machine learning, and it shows good classification effect in text data processing. Chinese text and English text have differences in form, sound, and image. Focusing on the uniqueness of Chinese text classification, this paper analyzes and expounds the deep learning methods used for Chinese text classification, and finally sorts out commonly used datasets for Chinese text classification.

Key words: Chinese text classification, natural language, deep learning, machine learning