Files
myWiki/concepts/endogenous-reasoning.md
2026-06-01 10:46:01 +08:00

43 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Endogenous Reasoning内生推理"
created: 2026-05-18
type: concept
tags: ["reasoning", "LLM", "reinforcement-learning"]
sources: ["https://arxiv.org/abs/2604.14142"]
---
# Endogenous Reasoning内生推理
## 定义
内生推理指模型**自发性**产生的推理行为,而非通过外部监督信号或精心设计的 prompt 模板所诱导。NSR-PreRL 被证明能显著激发这种内生推理能力。
## NSR-PreRL 的激发效果
在仅 20 步 NSR-PreRL 训练后Qwen3-4B, AMC23
| 推理模式 | 增长倍数 |
|---------|---------|
| Transition thoughts | **14.89×** |
| Reflection thoughts | **6.54×** |
| Subgoal Setting | 大幅增长 |
| Enumeration | 大幅增长 |
| Verification | 大幅增长 |
| Backtracking | 大幅增长 |
## 与 GRPO 对比
标准 GRPO25 步后)在激发内生推理方面明显弱于 NSR-PreRL仅 20 步),说明:
- 内生推理的激发源于预训练空间的操作(移除条件约束后,模型可自由探索知识空间)
- Post-train space 的条件约束**抑制**了这种自发推理行为的涌现
## 机制假设
NSR-PreRL 通过剪枝错误推理路径,**间接解锁**了模型在预训练阶段已编码但被条件约束抑制的内部知识。这与"预训练知识内部化"的理念一致模型参数中已经存在推理能力PreRL 只是去除了阻碍其表达的"噪音路径"。
## 相关概念
- [[negative-sample-reinforcement|NSR]]
- [[pre-train-space-reinforcement-learning|PreRL]]
- [[dual-space-rl|DSRL]]