• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (11): 92-96.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

基于结构相似匹配的SQL程序自动评估模型研究

杨鹤标,刘玲,杨立凡   

  1. (江苏大学计算机科学与通信工程学院,江苏 镇江 212013)
  • 收稿日期:2009-09-27 修回日期:2009-12-13 出版日期:2010-11-25 发布日期:2010-11-25
  • 通讯作者: 杨鹤标
  • 作者简介:杨鹤标(1960),男,江苏镇江人,教授,研究方向为数据挖掘、软件体系结构、信息系统和软件工程;刘玲,硕士生,研究方向为数据库和数据挖掘;杨立凡,本科生。
  • 基金资助:

    江苏省高技术研究资助项目(BG2007028)

A Study of the Automated Programming Assessment Model for SQL Based on Structure Similarity Matching

YANG Hebiao,LIU Ling,YANG Lifan   

  1. (School of Computer Science and Telecommunications Engineering,Jiangsu University,Zhenjiang 212013,China)
  • Received:2009-09-27 Revised:2009-12-13 Online:2010-11-25 Published:2010-11-25

摘要: 针对SQL语言编程能力评估的多因素影响、界限模糊特性造成的难度和偏差问题,本文提出了基于结构相似度匹配的评估模型(SQLAPAM)。结合静态评估与动态评估方法,给出了模型的整体框架;模型对提交的SQL语句进行规范化、分词处理后,将其转换成等价的单词序列对,进而构建对应的结构树Stree;使用于代价模型、子结构贡献因子两方面上有所改进的树编辑距离算法计算与目标树的相似性值;最后利用正态分布思想将相似度值映射到成绩区间,并通过相似度阈值来调整影响因素所带来的偏差,给出SQL程序的定量评判结果。最后对模型作了基于数据的实验分析与验证,训练数据集进行参数调整,对模型进行优化。

关键词: 相似性分析, 自动评估, 分词, 树编辑距离, 正态分布

Abstract: In view of the difficulty and the diviation caused by the features of multifactor and fuzzy boundaries of the automated programming assessment model for SQL languages(SQLAPAM),this paper introduces an assessment model based on structure similarity matching.The overall framework of the model is proposed with the combination of static and dynamic assessment methods. After being processed by standardization and tokenization, the submitted SQL statements are transformed into the equivalent sequence of token pairs with which the model constructs a corresponding structure tree(Stree). Next the model calculates similarity between the acquired tree and the target tree using the tree edit distance improved in the cost model and the substructure contribution factor,and gains a similarity threshold. Finally, the model maps similarity to the score intervals with reference to the normal distribution theory and adjusts the deviation brought by the impact factors with the help of the similarity threshold. Meanwhile the final assessment result for the SQL program is provided.

Key words: similarity analysis;automated assessment;tokenization;tree edit distance;normal distribution