• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (2): 268-276.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

A multi-stage collaborative reasoning framework for legal question answering with large language models

FU Qihang,QIN Yongbin,HUANG Ruizhang,ZHOU Yulin,HU Qingqing   

  1. (1.Text Computing  and Cognitive Intelligence,
    Ministry of Education Engineering Research Center,Guizhou University,Guiyang  550025;
    2.State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025;
    3.College of Computer Science and Technology,Guizhou University,Guiyang 550025;
    4.School of Law and Public Administration,Guizhou Qiannan College of Science and Technology,Huishui 550600,China)
  • Received:2025-07-20 Revised:2025-09-10 Online:2026-02-25 Published:2026-03-10

Abstract: In recent years, large language models (LLMs) have demonstrated broad prospects in the judicial field. However, in knowledge-intensive reasoning and complex logical judgment tasks within judicial question-answering scenarios, challenges such as inadequate reasoning capabilities and imprecise application of legal knowledge persist. To address these issues, this paper proposes a decoupled collaborative reasoning framework (DCRF) that separates “thinking” from “reasoning” in a multi-stage coope- rative process. First, a fine-tuned lightweight “Thinker” generates high-level chains to guide downstream reasoning strategies. Then, an unmodified Qwen1.5-14B-chat “Reasoner”, supported by retrieval-augmented generation and relevant statutory texts, performs fine-grained logical inference. By coordi- nating strategic planning with execution, this framework significantly enhances the model’s flexibility and accuracy in invoking legal knowledge, while avoiding the high costs of fine-tuning large models and reducing overall training overhead. On the JEC-QA and DISC-Law-Eval benchmarks, DCRF achieves an average improvement of 9.77 percentage points in accuracy for single-choice questions and an average increase of 7.48 percentage points in F1-score for multiple-choice questions compared to the base models. Notably, it surpasses DeepSeek-R1-Distill-Qwen-14B in single-choice questions and performs comparably in multiple-choice questions. Experimental results indicate that DCRF effectively strengthens the judicial reasoning capabilities of large language models while reducing training costs.

Key words: multi-stage reasoning, large language models, legal reasoning, retrieval-augmented gene- ration, instruction tuning