谓词自动识别中的特征选择度量研究

J4 ›› 2012, Vol. 34 ›› Issue (9): 188-192.

谓词自动识别中的特征选择度量研究

张宜浩,金澎

（1.乐山师范学院计算机科学学院，四川乐山 614004;2.乐山师范学院智能信息处理与应用实验室，四川乐山 614004）

收稿日期:2012-04-12 修回日期:2012-06-16 出版日期:2012-09-25 发布日期:2012-09-25
基金资助:
四川省教育厅资助科研资助项目(10ZB025）；国家自然科学基金资助项目(61003206）;乐山师范学院科研创新团队建设计划资助项目

Research on Feature Selection Metric for Predicate Identification

ZHANG Yihao,JIN Peng

（1.School of Computer Science,Leshan Teachers’ College,Leshan 614004;（2.Laboratory of Intelligent Information Processing and Application Institutional,Leshan Teachers’ College,Leshan 614004,China）

Received:2012-04-12 Revised:2012-06-16 Online:2012-09-25 Published:2012-09-25

摘要/Abstract

摘要：

谓词的自动识别是浅层句法分析的重要内容。本文提出了基于支持向量机分类算法的谓词自动识别方法，重点描述了在特征构建过程中基于信息增益的特征筛选方法与基于同义词词林的特征词度量方法。信息增益方法选取对分类影响较大的特征，降低了特征维度；同义词词林的度量方法将特征词映射为深层次的语义概念，增强了特征的表达能力，强调了属性特征与模型的相关度。在小规模语料库上的实验表明，谓词识别的最好FScore达到了84.0%，相较于对数据无任何处理的情况FScore提高了4.6%。结果表明，这种新的特征筛选与特征度量方法在谓词识别中十分有效，可以极大提高分类器的性能。

关键词: 谓词识别, 特征选择, 同义词词林, 信息增益, 支持向量机

Abstract:

Predicate Identification is one of the important research topics in shallow parsing.In this paper, a predicate identification method is proposed based on the support vector machine classification algorithm.Our focus is on the feature selection method with information gain and the metric method of feature words with TongYiCiCiLin information gain method selects the features that have a greater impact to classification model,which can reduce the dimensions of feature vector.TongYiCiCiLin maps the feature words into deepseated semantic concept,enhances the representation ability of features, and emphasizes the degree of correlation between the features and the model.Experiments on a relatively small corpus show that the best FScore of predicate identification reaches 84.0% and increases by 4.6% compared with the situation without dealing with the data.The experimental results show that the new method of the selection method of feature words and the representation of feature attribute are effective for predicate identification and can greatly improve the performance of classification.

Key words: predicate identification;feature selection;TongYiCiCiLin;information gain;support vector machine

张宜浩,金澎. 谓词自动识别中的特征选择度量研究[J]. J4, 2012, 34(9): 188-192.

ZHANG Yihao,JIN Peng. Research on Feature Selection Metric for Predicate Identification[J]. J4, 2012, 34(9): 188-192.

[1]	陈丽芳, 白云, 施永辉, 代琪. 面向不平衡数据的特征子空间增强的异质集成学习[J]. 计算机工程与科学, 2025, 47(05): 940-950.
[2]	刘振超, 苑迎春, 王克俭, 何晨. 融合特征权重与改进粒子群优化的特征选择算法[J]. 计算机工程与科学, 2024, 46(02): 282-291.
[3]	钟卓辉, 陈黎飞, . 基于模型的非凸聚类算法[J]. 计算机工程与科学, 2024, 46(02): 292-302.
[4]	赵瑞平, 降爱莲. 基于自编码器和局部嵌入的无监督特征选择[J]. 计算机工程与科学, 2023, 45(07): 1282-1291.
[5]	顾楚梅, 曹建军, 王保卫, 徐雨芯, . 基于蚁群参数优化的LightGBM辐射源个体识别[J]. 计算机工程与科学, 2023, 45(01): 85-94.
[6]	文武, 万玉辉, 文志云, . 基于正余弦算法的文本特征选择[J]. 计算机工程与科学, 2022, 44(08): 1467-1473.
[7]	刘云, 肖添, 王梓宇. 动态特征选择算法对恶意行为检测的优化研究[J]. 计算机工程与科学, 2022, 44(04): 665-673.
[8]	吴尚智, 徐丹丹, 王旭文, 夏宁. 基于广义重要度和runner-root算法的特征选择[J]. 计算机工程与科学, 2022, 44(04): 723-729.
[9]	李雨晨, 魏巍, 白伟明, 王达. 基于标签共现关系的多标签特征选择[J]. 计算机工程与科学, 2021, 43(11): 2049-2055.
[10]	文武, 万玉辉, 张许红, 文志云, . 基于改进CHI和PCA的文本特征选择[J]. 计算机工程与科学, 2021, 43(09): 1645-1652.
[11]	谢明鸿, 冉强, 王红斌, . 基于同义词词林和规则的中文远程监督人物关系抽取方法[J]. 计算机工程与科学, 2021, 43(09): 1661-1667.
[12]	李向军1,2，孔珂2，魏智翔1，王科选1，肖聚鑫1. 面向Android恶意应用静态检测的特征频数差异增强算法[J]. 计算机工程与科学, 2020, 42(06): 993-1002.
[13]	孟昱煜，陈绍立，刘兴长. 面向排序学习的层次聚类特征选择算法[J]. 计算机工程与科学, 2019, 41(12): 2211-2216.
[14]	张守宾，朱习军. 集成学习算法在中医证型分类预测中的应用[J]. 计算机工程与科学, 2019, 41(02): 328-334.
[15]	纪明君，刘漫丹，才乐千. 基于半监督LDA特征子空间优化的人脸识别算法[J]. 计算机工程与科学, 2018, 40(10): 1851-1857.