• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (9): 188-192.

• 论文 • 上一篇    下一篇

谓词自动识别中的特征选择度量研究

张宜浩,金 澎   

  1. (1.乐山师范学院计算机科学学院,四川 乐山 614004;2.乐山师范学院智能信息处理与应用实验室,四川 乐山 614004)
  • 收稿日期:2012-04-12 修回日期:2012-06-16 出版日期:2012-09-25 发布日期:2012-09-25
  • 基金资助:

    四川省教育厅资助科研资助项目(10ZB025);国家自然科学基金资助项目(61003206);乐山师范学院科研创新团队建设计划资助项目

Research on Feature Selection Metric for Predicate Identification

ZHANG Yihao,JIN Peng   

  1. (1.School of Computer Science,Leshan Teachers’ College,Leshan 614004;(2.Laboratory of Intelligent Information Processing and Application Institutional,Leshan Teachers’ College,Leshan 614004,China)
  • Received:2012-04-12 Revised:2012-06-16 Online:2012-09-25 Published:2012-09-25

摘要:

谓词的自动识别是浅层句法分析的重要内容。本文提出了基于支持向量机分类算法的谓词自动识别方法,重点描述了在特征构建过程中基于信息增益的特征筛选方法与基于同义词词林的特征词度量方法。信息增益方法选取对分类影响较大的特征,降低了特征维度;同义词词林的度量方法将特征词映射为深层次的语义概念,增强了特征的表达能力,强调了属性特征与模型的相关度。在小规模语料库上的实验表明,谓词识别的最好FScore达到了84.0%,相较于对数据无任何处理的情况FScore提高了4.6%。结果表明,这种新的特征筛选与特征度量方法在谓词识别中十分有效,可以极大提高分类器的性能。

关键词: 谓词识别, 特征选择, 同义词词林, 信息增益, 支持向量机

Abstract:

Predicate Identification is one of the important research topics in shallow parsing.In this paper, a predicate identification method is proposed based on the support vector machine classification algorithm.Our focus is on the feature selection method with information gain and the metric method of feature words with TongYiCiCiLin information gain method selects the features that have a greater impact to classification model,which can reduce the dimensions of feature vector.TongYiCiCiLin maps the feature words into deepseated semantic concept,enhances the representation ability of features, and emphasizes the degree of correlation between the features and the model.Experiments on a relatively small corpus show that the best FScore of predicate identification reaches 84.0% and increases by 4.6% compared with the situation without dealing with the data.The experimental results show that the new method of the selection method of feature words and the representation of feature attribute are effective for predicate identification and can greatly improve the performance of classification.

Key words: predicate identification;feature selection;TongYiCiCiLin;information gain;support vector machine