• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 268-276.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于多阶段协同推理的大语言模型司法问答框架

付启航,秦永彬,黄瑞章,周裕林,胡青青   

  1. (1.贵州大学文本计算与认知智能教育部工程研究中心,贵州 贵阳 550025;
    2.贵州大学公共大数据国家重点实验室,贵州 贵阳 550025;3.贵州大学计算机科学与技术学院,贵州 贵阳 550025;
    4.贵州黔南科技学院法学与公共管理学院,贵州 惠水 550600)

  • 收稿日期:2025-07-20 修回日期:2025-09-10 出版日期:2026-02-25 发布日期:2026-03-10
  • 基金资助:
    国家重点研发计划(2023YFC3304500);国家自然科学基金(62066007,62066008);贵州省重大科技专项(黔科合重大专项字[2024]003)


A multi-stage collaborative reasoning framework for legal question answering with large language models

FU Qihang,QIN Yongbin,HUANG Ruizhang,ZHOU Yulin,HU Qingqing   

  1. (1.Text Computing  and Cognitive Intelligence,
    Ministry of Education Engineering Research Center,Guizhou University,Guiyang  550025;
    2.State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025;
    3.College of Computer Science and Technology,Guizhou University,Guiyang 550025;
    4.School of Law and Public Administration,Guizhou Qiannan College of Science and Technology,Huishui 550600,China)
  • Received:2025-07-20 Revised:2025-09-10 Online:2026-02-25 Published:2026-03-10

摘要: 近年来,大语言模型在司法领域展现出广阔前景,但在知识密集型推理与复杂逻辑判断的司法问答任务中,仍存在推理能力不足、法律知识运用不精准等挑战。为此,提出了一种“思考推理”解耦的多阶段协同推理框架DCRF,通过微调轻量级“思考者”生成高层次思维链,为下游推理提供策略引导;再由未经微调的Qwen1.5-14B-Chat“推理者”,在检索增强生成机制及相关法律条文的辅助下,展开细粒度逻辑推理。该框架实现了策略层与推理执行的协同,显著提升了模型调用法律知识的灵活性和准确性,同时避开大模型高成本微调,降低了训练开销。在JEC-QA,DISC-Law-Eval Benchmark等数据集上,DCRF在单选题准确率较基线模型平均提升9.77个百分点,在多选题F1分数上平均提升7.48个百分点;其中,单选超越DeepSeek-R1-Distill-Qwen-14B,多选表现与其相当。实验结果表明,DCRF在降低训练成本的同时,有效强化了大语言模型的司法推理能力。

关键词: 多阶段推理, 大语言模型, 法律推理, 检索增强生成, 指令微调

Abstract: In recent years, large language models (LLMs) have demonstrated broad prospects in the judicial field. However, in knowledge-intensive reasoning and complex logical judgment tasks within judicial question-answering scenarios, challenges such as inadequate reasoning capabilities and imprecise application of legal knowledge persist. To address these issues, this paper proposes a decoupled collaborative reasoning framework (DCRF) that separates “thinking” from “reasoning” in a multi-stage coope- rative process. First, a fine-tuned lightweight “Thinker” generates high-level chains to guide downstream reasoning strategies. Then, an unmodified Qwen1.5-14B-chat “Reasoner”, supported by retrieval-augmented generation and relevant statutory texts, performs fine-grained logical inference. By coordi- nating strategic planning with execution, this framework significantly enhances the model’s flexibility and accuracy in invoking legal knowledge, while avoiding the high costs of fine-tuning large models and reducing overall training overhead. On the JEC-QA and DISC-Law-Eval benchmarks, DCRF achieves an average improvement of 9.77 percentage points in accuracy for single-choice questions and an average increase of 7.48 percentage points in F1-score for multiple-choice questions compared to the base models. Notably, it surpasses DeepSeek-R1-Distill-Qwen-14B in single-choice questions and performs comparably in multiple-choice questions. Experimental results indicate that DCRF effectively strengthens the judicial reasoning capabilities of large language models while reducing training costs.

Key words: multi-stage reasoning, large language models, legal reasoning, retrieval-augmented gene- ration, instruction tuning