• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (01): 138-148.

• 人工智能与数据挖掘 • 上一篇    下一篇

机器生成语言的质量评价方法综述

秦颖   

  1. (北京外国语大学人工智能与人类语言实验室,北京 100089)
  • 收稿日期:2020-08-16 修回日期:2020-10-30 接受日期:2022-01-25 出版日期:2022-01-25 发布日期:2022-01-13
  • 基金资助:
    北京外国语大学校级科研基金(2020SYLZDXM040)

A survey on quality evaluation of machine generated texts

QIN Ying   

  1. (Artificial Intelligence and Human Languages Laboratory,Beijing Foreign Studies University,Beijing 100089,China)

  • Received:2020-08-16 Revised:2020-10-30 Accepted:2022-01-25 Online:2022-01-25 Published:2022-01-13

摘要: 生成语言的质量评价很大程度上影响着自然语言生成的研究,已成为制约该领域发展的瓶颈问题。通过对机器翻译、自动文摘、对话系统、图像标题生成和机器写作等广义自然语言生成任务的语言质量评价方法的汇总,介绍了人工评价和自动评价的特点、优缺点和开放评价资源,分析了不同任务的不同评价角度和适用面。不同评价方法的对比分析,可为方法融合和关键问题的探索提供借鉴。整体上机器生成语言质量评价还局限于语言形式的比较,在语义表达的准确性、衔接连贯性等深层评价上存在诸多挑战。结合评价难点问题和现有研究的推进情况,分析了生成语言质量评价的研究趋势。


关键词: 生成语言质量评价, 机器翻译, 自动文摘, 对话系统, 图像标题生成, 故事生成

Abstract: The quality evaluation of machine generated texts largely affects the research of Natural Language Generation (NLG), and has become a bottleneck restricting the development of the field. This paper reviews on the quality evaluation of various NLG tasks in a broad sense including machine translation, automatic summarization, dialogue, image captioning and machine writing with thorough summarization. The paper introduces the features, pros and cons of human evaluation and automatic metrics respectively as well as some open evaluation resources. This review analyzes the different perspective and applications of various evaluation tasks.  The comparative analysis of different evaluation methods can provide reference for method fusion and exploration of key issues. Overall, the quality evaluation of machine-generated language is still limited to the superficial comparison of linguistic forms, and there are many challenges in deeper evaluation at the level of semantic and coherence or cohesion. Based on the analysis of difficulties and current developments, the paper proposes the research tendencies of quality evaluation of generated texts.


Key words: quality evaluation of generated text, machine translation, automatic summarization, dialogue system, image captioning, storytelling