机器生成语言的质量评价方法综述

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (1): 138-148.

机器生成语言的质量评价方法综述

秦颖

(北京外国语大学人工智能与人类语言实验室,北京 100089)

收稿日期:2020-08-16 修回日期:2020-10-30 出版日期:2022-01-25 发布日期:2022-01-13
基金资助:
北京外国语大学校级科研基金(2020SYLZDXM040)

A survey on quality evaluation of machine generated texts

QIN Ying

(Artificial Intelligence and Human Languages Laboratory,Beijing Foreign Studies University,Beijing 100089,China)

Received:2020-08-16 Revised:2020-10-30 Online:2022-01-25 Published:2022-01-13

摘要/Abstract

摘要： 生成语言的质量评价很大程度上影响着自然语言生成的研究，已成为制约该领域发展的瓶颈问题。通过对机器翻译、自动文摘、对话系统、图像标题生成和机器写作等广义自然语言生成任务的语言质量评价方法的汇总，介绍了人工评价和自动评价的特点、优缺点和开放评价资源，分析了不同任务的不同评价角度和适用面。不同评价方法的对比分析，可为方法融合和关键问题的探索提供借鉴。整体上机器生成语言质量评价还局限于语言形式的比较，在语义表达的准确性、衔接连贯性等深层评价上存在诸多挑战。结合评价难点问题和现有研究的推进情况，分析了生成语言质量评价的研究趋势。

关键词: 生成语言质量评价, 机器翻译, 自动文摘, 对话系统, 图像标题生成, 故事生成

Abstract: The quality evaluation of machine generated texts largely affects the research of Natural Language Generation (NLG), and has become a bottleneck restricting the development of the field. This paper reviews on the quality evaluation of various NLG tasks in a broad sense including machine translation, automatic summarization, dialogue, image captioning and machine writing with thorough summarization. The paper introduces the features, pros and cons of human evaluation and automatic metrics respectively as well as some open evaluation resources. This review analyzes the different perspective and applications of various evaluation tasks. The comparative analysis of different evaluation methods can provide reference for method fusion and exploration of key issues. Overall, the quality evaluation of machine-generated language is still limited to the superficial comparison of linguistic forms, and there are many challenges in deeper evaluation at the level of semantic and coherence or cohesion. Based on the analysis of difficulties and current developments, the paper proposes the research tendencies of quality evaluation of generated texts.

Key words: quality evaluation of generated text, machine translation, automatic summarization, dialogue system, image captioning, storytelling

秦颖. 机器生成语言的质量评价方法综述[J]. 计算机工程与科学, 2022, 44(1): 138-148.

QIN Ying. A survey on quality evaluation of machine generated texts[J]. Computer Engineering & Science, 2022, 44(1): 138-148.

[1]	姜云卓, 贡正仙. 基于修辞结构的篇章级神经机器翻译[J]. 计算机工程与科学, 2025, 47(1): 180-190.
[2]	申影利, 赵小兵, . 语言模型蒸馏的低资源神经机器翻译方法[J]. 计算机工程与科学, 2024, 46(4): 743-751.
[3]	陈欢欢, 王剑, Muhammad Naeem Ul Hassan. 融合乌尔都语词性序列预测的汉乌神经机器翻译[J]. 计算机工程与科学, 2024, 46(3): 518-524.
[4]	杜连成, 郭军军, 叶俊杰, 余正涛, . 双级交互式自适应融合的多模态神经机器翻译[J]. 计算机工程与科学, 2024, 46(11): 2071-2080.
[5]	张迎晨, 高盛祥, 余正涛, 王振晗, 毛存礼, . 融合BERT与词嵌入双重表征的汉越神经机器翻译方法[J]. 计算机工程与科学, 2023, 45(3): 546-553.
[6]	王煦, 贾浩, 季佰军, 段湘煜. 基于词典模型融合的神经机器翻译[J]. 计算机工程与科学, 2022, 44(8): 1481-1487.
[7]	薛擎天, 李军辉, 贡正仙, 徐东钦. 基于预训练的无监督神经机器翻译模型研究[J]. 计算机工程与科学, 2022, 44(4): 730-736.
[8]	肖妮妮, 金畅, 段湘煜. 基于提高伪平行句对质量的无监督领域适应机器翻译[J]. 计算机工程与科学, 2022, 44(12): 2230-2237.
[9]	尤丛丛, 高盛祥, 余正涛, 毛存礼, 潘润海, . 基于同义词数据增强的汉越神经机器翻译方法[J]. 计算机工程与科学, 2021, 43(8): 1497-1502.
[10]	贾承勋, 赖华, 余正涛, 文永华, 于志强, . 基于枢轴语言的汉越神经机器翻译伪平行语料生成[J]. 计算机工程与科学, 2021, 43(3): 542-550.
[11]	史小静, 宁秋怡, 季佰军, 段湘煜. 信息传递增强的神经机器翻译[J]. 计算机工程与科学, 2021, 43(1): 134-141.
[12]	肖新凤1,2，李石君2，余伟2，刘杰2，刘倍雄1. 基于改进seq2seq模型的英汉翻译研究[J]. 计算机工程与科学, 2019, 41(07): 1257-1265.
[13]	刘婉婉，苏依拉，乌尼尔，仁庆道尔吉. 基于LSTM的蒙汉机器翻译的研究[J]. 计算机工程与科学, 2018, 40(10): 1890-1896.
[14]	杨宪泽，陈毅红. 汉藏机器翻译的特点与手写汉字切分分析研究[J]. J4, 2014, 36(8): 1595-1598.
[15]	杨宪泽,肖明. 一种混合式机器翻译方法的分析研究[J]. J4, 2012, 34(2): 168-171.