• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (4): 718-727.

• 人工智能与数据挖掘 • 上一篇    下一篇

融合双词典的农作物病虫害命名实体识别

朱西平,高昂,肖丽娟   

  1. (西南石油大学电气信息学院,四川 成都 610500)
  • 收稿日期:2023-11-21 修回日期:2024-05-31 出版日期:2025-04-25 发布日期:2025-04-17
  • 基金资助:
    四川省科技计划项目(2020YFN0019)

Named entity recognition of crop diseases and pests fusing dual dictionary

ZHU Xiping,GAO Ang,XIAO Lijuan   

  1. (School of Electrical Engineering and Information,Southwest Petroleum University,Chengdu 610500,China)
  • Received:2023-11-21 Revised:2024-05-31 Online:2025-04-25 Published:2025-04-17

摘要: 针对农作物病虫害数据领域性强、数据类型不平衡以及实体嵌套导致通用模型识别精度不高等问题,提出了一种融合双词典的农作物病虫害命名实体识别模型。首先,将原始字符数据和词汇数据分别引入LE-RoBERTa模块和GC-SoftLexicon模块,经增强处理后获得2个独立的字符向量。然后,将融合后的字符向量输入到BiLSTM编码层和CRF解码层获得最优实体序列输出。实验结果表明,模型在构建的农作物病虫害实体数据集上的F1值达到了95.56%,能够有效识别农作物病虫害命名实体。

关键词: 命名实体识别, 农作物病虫害, 农业词典, 字词融合, 注意力机制

Abstract: Addressing the issues of domain-specificity, imbalance, and nested entities in crop pest and disease data, which lead to low recognition accuracy of general models, a crop  disease  and pest entity recognition model incorporating a dual-dictionary approach is proposed. Firstly, the original character data and vocabulary data are introduced into the LE-RoBERTa module and GC-SoftLexicon module, respectively, two independent character vectors are obtained after  enhancement processing. Then, the fused character vectors are input into the BiLSTM encoding layer and CRF decoding layer to obtain the optimal entity sequence output. Experimental results show that the model achieves an F1 -score of 95.56% on the constructed crop  disease  and pest entity dataset, effectively recognizing crop disease and pest entities.

Key words: named entity recognition, crop diseases and pests, agricultural dictionary, word fusion, attention mechanisim