A ChineseEnglish parallel corpus for information extraction

J4 ›› 2015, Vol. 37 ›› Issue (12): 2331-2338.

• 论文 • Previous Articles Next Articles

A ChineseEnglish parallel corpus for information extraction

HUI Haotian，LI Yunjian，QIAN Longhua，ZHOU Guodong

(1.Natural Language Processing Lab,Soochow University,Suzhou 215006;2.School of Computer Science & Technology,Soochow University,Suzhou 215006,China)

Received:2015-08-26 Revised:2015-10-21 Online:2015-12-25 Published:2015-12-25

Abstract

Abstract:

In addition to machine translation, parallel corpora play an important role in information retrieval, information extraction and knowledge acquisition, etc. However, traditional parallel corpora are aligned at sentence level, thus their significance for research on crosslanguage natural language processing is limited. In view of this, on the basis of the OntoNotes, we construct a high quality Chinese and English parallel corpus for information extraction by combining automatic extraction, automatic mapping and manual annotation. The corpus contains the entities and their mutual relations, and achieves the alignment between Chinese and English both on entity and relation levels. This corpus therefore can facilitate comparative study of information extraction in Chinese and English, reveal the difference of semantic expressions between languages, and also provide a valuable platform for research on cross-language information extraction.Key words:

Key words: named entity;semantic relation;bilingual mapping;parallel corpus

HUI Haotian，LI Yunjian，QIAN Longhua，ZHOU Guodong. A ChineseEnglish parallel corpus for information extraction [J]. J4, 2015, 37(12): 2331-2338.

A ChineseEnglish parallel corpus for information extraction

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments