• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (11): 2008-2018.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于大语言模型的司法文本摘要研究

裴炳森,李欣,樊志杰,蒋章涛,孙昊扬,刘梓锐   

  1. (1.中国人民公安大学信息网络安全学院,北京 100038;2.复旦大学计算机科学技术学院,上海 200438;
    3.中国人民公安大学安全防范技术与风险评估公安部重点实验室,北京 100038)

  • 收稿日期:2024-03-27 修回日期:2024-06-28 出版日期:2025-11-25 发布日期:2025-12-08
  • 基金资助:
    国家重点研发计划(2022YFC3301305)

Research on judicial text summarization based on large language model

PEI Bingsen,Li Xin,FAN Zhijie,JIANG Zhangtao,SUN Haoyang,LIU Zirui   

  1. (1.Information Network Security Academy,People’s Public Security University of China,Beijing 100038;
    2.School of Computer Science and Technology,Fudan University, Shanghai 200438;
    3.Key Laboratory of Security Prevention and Risk Assessment of the Ministry of Public Security,
    People’s Public Security University of China,Beijing 100038,China)
  • Received:2024-03-27 Revised:2024-06-28 Online:2025-11-25 Published:2025-12-08

摘要: 随着科学技术的不断发展,通用人工智能技术展现了其强大的语言理解和生成能力。在司法领域,人工智能也发挥着越来越重要的作用,司法信息化已经逐渐转变为司法智能化、智慧化。在转变进程中,司法文本的摘要生成是一项重要工作,根据司法文本生成摘要能够实现“降维”的目的,并有助于迅速了解案件详情、获取案件要素,为从业者高效进行信息获取提供支撑。但是目前的司法文本摘要生成技术仍存在部分问题,如:生成摘要中缺少法律条文作为判罚依据,摘要存在语法错误和语句不通等导致摘要可读性不强等一系列问题。为解决上述问题,利用大语言模型出色的语言理解能力和生成能力,结合不同的微调技术并设计不同的提示模板,构建了针对法律文本摘要生成的垂直领域大模型,经过在各类数据集上的验证,表明了该模型的可行性,为大语言模型和司法领域结合提供了可能的方式。

关键词: 文本摘要, 智慧司法, 大语言模型, 参数微调

Abstract: With the continuous development of science and technology, general artificial intelligence (AGI) technology has demonstrated its powerful capabilities in language understanding and generation. In the judicial field, artificial intelligence also plays an increasingly important role, gradually transition- ing from judicial informatization to judicial intellectualization and smart judicial services. In this transition process, the summarization of judicial texts is a key task. Generating summaries based on judicial texts can achieve the goal of “dimensionality reduction”, help quickly grasp case details and obtain case elements, and provide support for practitioners to efficiently acquire information. However, current judicial text summarization technologies still have some problems, such as: the generated summaries lack legal provisions as the basis for judgment, and the summaries have grammatical errors and incoherent sentences, which lead to poor readability, among other issues. To solve the above problems, this paper  leverages the excellent language understanding and generation capabilities of large language models (LLMs), combines different fine-tuning technologies, and designs different prompt templates to construct a domain-specific large model for judicial text summarization. Verification on various datasets proves the feasibility of this model, providing a potential approach for the integration of large language models and the judicial field.

Key words: text summarization, smart courts, large language model, parameter fine-tuning