• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2008, Vol. 30 ›› Issue (2): 64-66.

• 论文 • 上一篇    下一篇

基于聚类方法对特定领域术语的自动筛选

李勇   

  • 出版日期:2008-02-01 发布日期:2010-05-19

  • Online:2008-02-01 Published:2010-05-19

摘要:

从大规模无标注的文本中获取特定领域的术语词典,通常采用的方法是从术语抽取器得到相关术语,而后使用手工的方式进行术语筛选,得到相关领域的术语。这需要大量的人力物力,并且标准无法统一。本文提出了一种利用CBC聚类方法从抽取的术语文本中自动别除非此领域的术语,并且通过对训练语料库文本的不断丰富,还可以对新词进行
 识别,以扩大该领域的术语集。最后,通过对实验结果进行评测,显示了CBC聚类方法对术语筛选的良好效果。

关键词: CBC聚类方法 术语筛选 语料库 术语抽取

Abstract:

In order to get the specific field term dictionary from large-scale unlabelled texts,we usually use manual methods to filter terms after getting the terms from the machine of term-extraction. But this needs more manpower and material resources. This paper proposes a new way to automatically filter the specific terms from term texts based on the CBC(cluster by committee) clustering method. Meanwhile, it can recognize new field terms by enlarging the field corpus. Finally it evaluates the results of this experiment, and shows the better effect of the method in filtering terms.

Key words: CBC(cluster by committee), term filtering, corpus, term extracting