Distributed representation of Chinese and Thai 
words based on cross-lingual corpus

J4 ›› 2015, Vol. 37 ›› Issue (12): 2358-2365.

• 论文 • Previous Articles Next Articles

Distributed representation of Chinese and Thai
words based on cross-lingual corpus

ZHANG Jinpeng1,2,ZHOU Lanjiang1,2,XIAN Yantuan1,2,YU Zhengtao1,2,HE Silan3

(1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;
2.The Key Laboratory of Intelligent Information Processing,
Kunming University of Science and Technology,Kunming 650500;
3.School of Science,Kunming University of Science and Technology,Kunming 650500，China)

Received:2015-08-20 Revised:2015-10-17 Online:2015-12-25 Published:2015-12-25

Abstract

Abstract:

Word representation is the basic research content of natural language processing. At present, distributed representation of monolingual words has shown satisfactory application effect in some Neural Probabilistic Language (NPL) research, while as for distributed representation of crosslingual words, there is little research both at home and abroad. Aiming at this problem, given distribution similarity of nouns and verbs in these two languages, we embed mutual translated words, synonyms, superordinates into Chinese corpus by the weakly supervised learning extension approach and other methods, thus Thai word distribution in crosslingual environment of Chinese and Thai is learned. We applied the distributed representation of the crosslingual words learned before to compute similarities of bilingual texts and classify the mixed text corpus of Chinese and Thai. Experimental results show that the proposal has a satisfactory effect on the two tasks.

Key words: weakly supervised learning extension;cross-lingual corpus;cross-lingual word distribution representations;neural probabilistic language model

ZHANG Jinpeng1,2,ZHOU Lanjiang1,2,XIAN Yantuan1,2,YU Zhengtao1,2,HE Silan3. Distributed representation of Chinese and Thai
words based on cross-lingual corpus [J]. J4, 2015, 37(12): 2358-2365.

Distributed representation of Chinese and Thai
words based on cross-lingual corpus

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

Distributed representation of Chinese and Thai words based on cross-lingual corpus

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

Distributed representation of Chinese and Thai
words based on cross-lingual corpus