基于多阶段协同推理的大语言模型司法问答框架

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (2): 268-276.

基于多阶段协同推理的大语言模型司法问答框架

付启航，秦永彬，黄瑞章，周裕林，胡青青

(1．贵州大学文本计算与认知智能教育部工程研究中心，贵州贵阳 550025；
2．贵州大学公共大数据国家重点实验室，贵州贵阳 550025；3．贵州大学计算机科学与技术学院，贵州贵阳 550025；
4．贵州黔南科技学院法学与公共管理学院，贵州惠水 550600)

收稿日期:2025-07-20 修回日期:2025-09-10 出版日期:2026-02-25 发布日期:2026-03-10
基金资助:
国家重点研发计划(2023YFC3304500)；国家自然科学基金(62066007,62066008)；贵州省重大科技专项(黔科合重大专项字［2024］003)

A multi-stage collaborative reasoning framework for legal question answering with large language models

FU Qihang，QIN Yongbin，HUANG Ruizhang，ZHOU Yulin，HU Qingqing

(1.Text Computing and Cognitive Intelligence,
Ministry of Education Engineering Research Center,Guizhou University,Guiyang 550025;
2.State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025;
3.College of Computer Science and Technology,Guizhou University,Guiyang 550025;
4.School of Law and Public Administration,Guizhou Qiannan College of Science and Technology,Huishui 550600,China)

Received:2025-07-20 Revised:2025-09-10 Online:2026-02-25 Published:2026-03-10

摘要/Abstract

摘要： 近年来，大语言模型在司法领域展现出广阔前景，但在知识密集型推理与复杂逻辑判断的司法问答任务中，仍存在推理能力不足、法律知识运用不精准等挑战。为此，提出了一种“思考推理”解耦的多阶段协同推理框架DCRF，通过微调轻量级“思考者”生成高层次思维链，为下游推理提供策略引导；再由未经微调的Qwen1.5-14B-Chat“推理者”，在检索增强生成机制及相关法律条文的辅助下，展开细粒度逻辑推理。该框架实现了策略层与推理执行的协同，显著提升了模型调用法律知识的灵活性和准确性，同时避开大模型高成本微调，降低了训练开销。在JEC-QA,DISC-Law-Eval Benchmark等数据集上，DCRF在单选题准确率较基线模型平均提升9.77个百分点，在多选题F1分数上平均提升7.48个百分点；其中，单选超越DeepSeek-R1-Distill-Qwen-14B，多选表现与其相当。实验结果表明，DCRF在降低训练成本的同时，有效强化了大语言模型的司法推理能力。

关键词: 多阶段推理, 大语言模型, 法律推理, 检索增强生成, 指令微调

Abstract: In recent years, large language models (LLMs) have demonstrated broad prospects in the judicial field. However, in knowledge-intensive reasoning and complex logical judgment tasks within judicial question-answering scenarios, challenges such as inadequate reasoning capabilities and imprecise application of legal knowledge persist. To address these issues, this paper proposes a decoupled collaborative reasoning framework (DCRF) that separates “thinking” from “reasoning” in a multi-stage coope- rative process. First, a fine-tuned lightweight “Thinker” generates high-level chains to guide downstream reasoning strategies. Then, an unmodified Qwen1.5-14B-chat “Reasoner”, supported by retrieval-augmented generation and relevant statutory texts, performs fine-grained logical inference. By coordi- nating strategic planning with execution, this framework significantly enhances the model’s flexibility and accuracy in invoking legal knowledge, while avoiding the high costs of fine-tuning large models and reducing overall training overhead. On the JEC-QA and DISC-Law-Eval benchmarks, DCRF achieves an average improvement of 9.77 percentage points in accuracy for single-choice questions and an average increase of 7.48 percentage points in F1-score for multiple-choice questions compared to the base models. Notably, it surpasses DeepSeek-R1-Distill-Qwen-14B in single-choice questions and performs comparably in multiple-choice questions. Experimental results indicate that DCRF effectively strengthens the judicial reasoning capabilities of large language models while reducing training costs.

Key words: multi-stage reasoning, large language models, legal reasoning, retrieval-augmented gene- ration, instruction tuning

付启航, 秦永彬, 黄瑞章, 周裕林, 胡青青. 基于多阶段协同推理的大语言模型司法问答框架[J]. 计算机工程与科学, 2026, 48(2): 268-276.

FU Qihang, QIN Yongbin, HUANG Ruizhang, ZHOU Yulin, HU Qingqing. A multi-stage collaborative reasoning framework for legal question answering with large language models[J]. Computer Engineering & Science, 2026, 48(2): 268-276.

[1]	李鹤, 迟昊昂, 刘明宇, 杨文婧. 一种基于因果关系的减轻大语言模型幻觉的方法[J]. 计算机工程与科学, 2026, 48(2): 245-255.
[2]	田宇, 李军辉, 朱苏阳, 周国栋. 基于数据增强的对话情绪识别[J]. 计算机工程与科学, 2026, 48(2): 330-340.
[3]	高福财, 何廷年, 杨阳, 杨江伟. GPR:一种大语言模型增强的方法[J]. 计算机工程与科学, 2026, 48(1): 162-171.
[4]	徐春, 孙恩威, 汪晓洁. 基于知识和数据双驱动的DRG医疗问答研究[J]. 计算机工程与科学, 2025, 47(6): 1121-1132.
[5]	曾垂振1, 2, 崔良中1, 马文卓2. 基于ERNIE模型的雷达维修命名实体识别研究[J]. 计算机工程与科学, 2025, 47(6): 1106-1113.
[6]	陈宇灵, 李翔. 基于图结构提示实现低资源场景下的节点分类[J]. 计算机工程与科学, 2025, 47(3): 534-547.
[7]	唐晋韬, 张成贤, 鲍琛龙, 李文静. 基于大语言模型的面向领域的非连续命名实体识别[J]. 计算机工程与科学, 2025, 47(12): 2253-2260.
[8]	刘高, 徐建良, 张先轶, 刘贤冬. OpenLM：多平台高性能的大语言模型推理框架[J]. 计算机工程与科学, 2025, 47(12): 2129-2138.
[9]	裴炳森, 李欣, 樊志杰, 蒋章涛, 孙昊扬, 刘梓锐. 基于大语言模型的司法文本摘要研究[J]. 计算机工程与科学, 2025, 47(11): 2008-2018.