• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (06): 1116-1122.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于神经网络的医疗文本分类研究

许浪1,2,李代伟1,2,张海清1,2,唐聃1,2,何磊1,2,于曦3   

  1. (1.成都信息工程大学软件工程学院,四川 成都 610225;
    2.四川省信息化应用支撑软件工程技术研究中心,四川 成都 610225;3.成都大学斯特灵学院,四川 成都 610106)

  • 收稿日期:2022-09-27 修回日期:2022-11-15 接受日期:2023-06-25 出版日期:2023-06-25 发布日期:2023-06-16
  • 基金资助:
    欧盟项目(598649-EPP-1-2018-1-FR-EPPKA2-CBHE-JP);国家自然科学基金(61602604);四川省科技厅项目(2021YFH0107,2022YFS0544,2022NSFSC0571)

Medical text classification based on neural network

XU Lang1,2,LI Dai-wei1,2,ZHANG Hai-qing1,2,TANG Dan1,2,HE Lei1,2,YU Xi3   

  1. (1.School of Software Engineering,Chengdu University of Information Technology,Chengdu 610225;
    2.Sichuan Province Engineering Technology Research Center of 
    Support Software of Informatization Application,Chengdu 610225;
    3.Stirling College,Chengdu University,Chengdu 610106,China)
  • Received:2022-09-27 Revised:2022-11-15 Accepted:2023-06-25 Online:2023-06-25 Published:2023-06-16

摘要: 传统的医学文本数据分类方法忽略了文本的上下文关系,每个词之间相互独立,无法表示语义信息,文本描述和分类效果差;并且特征工程需要人工干预,因此泛化能力不强。针对医疗文本数据分类效率低和精度低的问题,提出了一种基于Transformer双向编码器表示BERT、卷积神经网络CNN和双向长短期记忆BiLSTM神经网络的医学文本分类模型CMNN。该模型使用BERT训练词向量,结合CNN和BiLSTM,捕捉局部潜在特征和上下文信息。最后,将CMNN模型与传统的深度学习模型TextCNN和TextRNN在准确率、精确率、召回率和F1值方面进行了比较。实验结果表明,CMNN模型在所有评价指标上整体优于其他模型,准确率提高了1.69%~5.91%。

关键词: 自然语言处理, 医疗文本分类, BERT, CNN, BiLSTM

Abstract: The traditional medical text data classification methods ignore the context of the text. Each word is independent of each other and cannot represent semantic information. The text description and classification effect are poor, and feature engineering requires manual intervention, so the generalization ability is not strong. Aiming at the problems of low efficiency and low accuracy of medical text data classification, this paper proposes a medical text classification model CMNN based on bidirectional encoder representations from Transformer(BERT), convolutional neural network (CNN) and Bi- directional long and short-term memory (BiLSTM) neural network. The model uses BERT to train word vectors and combines CNN and BiLSTM to capture local latent features and contextual information. Finally, the proposed model is compared with the traditional deep learning models TextCNN and TextRNN in terms of accuracy, precision, recall and F1 score. The experimental results show that the CMNN model outperforms other models on all evaluation metrics, and the accuracy is improved by 1.69%~5.91%.

Key words: natural language processing, medical text classification, BERT, CNN, BiLSTM