J4 ›› 2015, Vol. 37 ›› Issue (12): 2331-2338.
• 论文 • Previous Articles Next Articles
HUI Haotian,LI Yunjian,QIAN Longhua,ZHOU Guodong
Received:
Revised:
Online:
Published:
Abstract:
In addition to machine translation, parallel corpora play an important role in information retrieval, information extraction and knowledge acquisition, etc. However, traditional parallel corpora are aligned at sentence level, thus their significance for research on crosslanguage natural language processing is limited. In view of this, on the basis of the OntoNotes, we construct a high quality Chinese and English parallel corpus for information extraction by combining automatic extraction, automatic mapping and manual annotation. The corpus contains the entities and their mutual relations, and achieves the alignment between Chinese and English both on entity and relation levels. This corpus therefore can facilitate comparative study of information extraction in Chinese and English, reveal the difference of semantic expressions between languages, and also provide a valuable platform for research on cross-language information extraction.Key words:
Key words: named entity;semantic relation;bilingual mapping;parallel corpus
HUI Haotian,LI Yunjian,QIAN Longhua,ZHOU Guodong. A ChineseEnglish parallel corpus for information extraction [J]. J4, 2015, 37(12): 2331-2338.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2015/V37/I12/2331