20260625:很多新内容
This commit is contained in:
45
concepts/large-reasoning-models.md
Normal file
45
concepts/large-reasoning-models.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "大推理模型 (Large Reasoning Models)"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: [reasoning, lrm, cot, r1]
|
||||
sources:
|
||||
- gan-thinking-based-non-thinking-2026
|
||||
---
|
||||
|
||||
# 大推理模型 (Large Reasoning Models)
|
||||
|
||||
LRM 是以长[[chain-of-thought|思维链]](CoT)为核心推理机制的先进语言模型,代表如 DeepSeek-R1(Guo et al., 2025)和 OpenAI o1(Jaech et al., 2024)。
|
||||
|
||||
## 工作机制
|
||||
|
||||
给定 prompt `x = [query, <think>]`,LRM 生成:
|
||||
```
|
||||
y = [y_1, ..., y_τ, </think>, y_{τ+2}, ..., y_m]
|
||||
```
|
||||
- `[y_1, ..., y_τ]`:思考(thinking)——探索、反思、自验证
|
||||
- `</think>`:思考结束标志
|
||||
- `[y_{τ+2}, ..., y_m]`:最终 solution
|
||||
|
||||
## 性能来源
|
||||
|
||||
LRM 的卓越性能**几乎完全源于 thinking**——长 CoT 中的多步推理、自我纠错和验证过程。但这也意味:
|
||||
- 简单查询同样经历完整思考
|
||||
- 大量"Wait... Let me check..." 类 token 无建设性
|
||||
- 推理开销和延迟显著增加
|
||||
|
||||
## Overthinking 与混合推理
|
||||
|
||||
LRM 的[[overthinking|过度思考]]问题催生了[[hybrid-reasoning-models|混合推理模型]]——让模型根据查询复杂度自主决定是否思考。
|
||||
|
||||
## 关键特性(TNT 利用的)
|
||||
|
||||
LRM 的 thinking mode 训练确保 `</think>` 之后的 solution 部分**不含额外思考**——这使得 solution 长度可作为非思考模式自然输出长度的可靠估计。这是 TNT 的核心假设和设计基础。
|
||||
|
||||
## 参考
|
||||
|
||||
- [[thinking-mode|思考模式]]
|
||||
- [[overthinking|过度思考]]
|
||||
- [[hybrid-reasoning-models|混合推理模型]]
|
||||
- [[gan-thinking-based-non-thinking-2026|TNT 论文]]
|
||||
Reference in New Issue
Block a user