• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (07): 1292-1299.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A low-resource Lao text regularization task based on BiLSTM

WANG Jian1,JIANG Lin1,2,WANG Lin-qin1,2,YU Zheng-tao1,2,ZHANG Song1,2,GAO Sheng-xiang1,2   

  1. (1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
    2.Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2021-11-24 Revised:2022-03-11 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

Abstract: Text normalization (TN) is an indispensable work in the front-end analysis task of speech synthesis text. Lao text normalization is to convert non-standard words (NSW) in Lao text into spoken-form words (SFW). At present, the task of text normalization has not yet been carried out in Lao, which mainly faces the problems of difficult acquisition of training data, diversified language expression and text regularization with ambiguity. A text normalization task in Lao is carried out. This task is completed as a sequence tagging task, and neural networks are used to predict NSW with ambiguity in combination with context. The corpus of the Lao text normalization task is constructed, the model results is predicted through the neural network, the self-attention mechanism is increased to deepen the relationship between the sequence characters, and different strategies are explored to introduce the pre-trained language model. An accuracy of 67.59% is achieved on the test set. 

Key words: Lao, text normalization, neural network, self-attention mechanism