• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (05): 901-909.

• 人工智能与数据挖掘 • 上一篇    下一篇

混合神经网络模型与注意力机制的地址匹配算法

陈健鹏,陈剑,佘祥荣,水新莹,陈刚   

  1. (长三角信息智能创新研究院,安徽 芜湖 241000)

  • 收稿日期:2021-09-26 修回日期:2021-12-06 接受日期:2022-05-25 出版日期:2022-05-25 发布日期:2022-05-24
  • 基金资助:
    安徽省重点研究与开发计划(202104a05020071)

An address matching algorithm based on hybrid neural network model and attention mechanism

CHEN Jian-peng,CHEN Jian,SHE Xiang-rong,SHUI Xin-ying,CHEN Gang   

  1. (Yangtze River Delta Information Intelligence Innovation Research Institute,Wuhu 241000,China)
  • Received:2021-09-26 Revised:2021-12-06 Accepted:2022-05-25 Online:2022-05-25 Published:2022-05-24

摘要: 中文地名地址的标准化在当前智慧城市的建设中起到至关重要的作用。传统的地名地址标准化技术通常使用基于文本字符层面的相似度计算或规则库匹配的方法,对复杂、特殊或冗余地址的处理效果较差。通过将地址标准化任务转换为针对地址相似的匹配度计算任务,提出了一种融合注意力机制与多层次语义表征的地址匹配算法。首先依据地址文本特殊的语法结构,利用Trie语法树构建标准地址树;而后基于注意力机制,利用Bi-LSTM网络与CNN网络生成地址对的多层次语义表示;最后通过曼哈顿距离计算相似度。在自主构建的数据集上,提出的SGAM模型的匹配准确度(91.22%)相比TextRCNN、FastText、基于注意力的卷积神经网络(ABCNN)等模型提升了4%~10%,表明SGAM模型在地址匹配任务上有着更好的性能表现。

关键词: 地名地址, 文本相似度计算, 注意力机制, 混合神经网络, 智慧城市

Abstract: The standardization of Chinese geographic addresses plays a crucial role in the current construction of smart cities. The traditional geographic address standardization technology usually uses the methods of similarity calculation or rule base matching based on the text character level, and the processing effect of complex, special or redundant addresses is poor. This paper proposes an address match- ing algorithm that combines attention mechanism and multi-level representation by converting the address standardization task into a matching degree calculation task for similar addresses. Firstly, according to the special grammatical structure of the address text, a standard address tree is constructed by using the Trie grammatical tree. Secondly, based on the attention mechanism, the Bi-LSTM network and the CNN network are used to generate multi-level semantic representations of address pairs. Finally, the similarity is calculated by Manhattan distance. On the self-built dataset, the proposed SGAM (Symmetrical Geographic Address Matching) model improves the matching accuracy (91.22%) by 4%~10% in comparison to TextRCNN, FastText, attention-based convolutional neural network (ABCNN) and other models, proving that the SGAM model has better performance on the address matching task.  


Key words: geographic address, text similarity calculation, attention mechanism, hybrid neural network, smart city