Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (02): 370-380.
• Artificial Intelligence and Data Mining • Previous Articles
CAO Jun-hang1,2,HUANG Rui-zhang1,2,BAI Rui-na1,2,ZHAO Jian-hui1,2
Received:
Revised:
Accepted:
Online:
Published:
Abstract: Chinese text error correction has a very important application in natural language processing. For Chinese texts with flexible and changeable writing, the existing error correction models cannot cover the correction of various types of errors, and there is always a problem that selecting TOP1 from TOPK has a high error probability. This paper proposes an integrated error correction framework for Chinese text—DCsR(Detector Correctors-Ranker). The framework abandons the previous solution based on the premise of known error types and uses a single model for error correction. According to different scenarios, a variety of excellent error correction models are selected for error correction and then integrated to recall a more comprehensive candidate set. At the same time, according to the importance of the customized features, a multi-strategy and scalable candidate sorting algorithm is established to select more credible correction results. The DCsR framework effectively solves the problem of model bias, and further improves the performance of Chinese text spelling error correction. The experimental results show that, compared with the single model with the best performance, the DCsR framework improves the F1 value of error correction by 3.93% on the public data set SIGHAN15, which further improves the error correction performance of Chinese text. The ablation experiment on CGED2020 also shows the effectiveness of the DCsR framework.
Key words: Chinese text error correction, DCsR framework, integrated error correction, feature importance, candidate sorting algorithm
CAO Jun-hang, HUANG Rui-zhang, BAI Rui-na, ZHAO Jian-hui, . DCsR:An integrated error correction framework for Chinese text[J]. Computer Engineering & Science, 2023, 45(02): 370-380.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2023/V45/I02/370