SidneyZhang/myWiki

Files

Sidney Zhang 6021dea160

20260625:很多新内容

2026-06-25 14:08:47 +08:00

1.6 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

大推理模型 (Large Reasoning Models)

2026-06-18

2026-06-18

concept

reasoning

lrm

cot

r1

gan-thinking-based-non-thinking-2026

大推理模型 (Large Reasoning Models)

LRM 是以长chain-of-thought（CoT）为核心推理机制的先进语言模型，代表如 DeepSeek-R1（Guo et al., 2025）和 OpenAI o1（Jaech et al., 2024）。

工作机制

给定 prompt x = [query, <think>]，LRM 生成：

y = [y_1, ..., y_τ, </think>, y_{τ+2}, ..., y_m]

[y_1, ..., y_τ]：思考（thinking）——探索、反思、自验证
</think>：思考结束标志
[y_{τ+2}, ..., y_m]：最终 solution

性能来源

LRM 的卓越性能几乎完全源于 thinking——长 CoT 中的多步推理、自我纠错和验证过程。但这也意味：

简单查询同样经历完整思考
大量"Wait... Let me check..." 类 token 无建设性
推理开销和延迟显著增加

Overthinking 与混合推理

LRM 的overthinking问题催生了hybrid-reasoning-models——让模型根据查询复杂度自主决定是否思考。

关键特性（TNT 利用的）

LRM 的 thinking mode 训练确保 </think> 之后的 solution 部分不含额外思考——这使得 solution 长度可作为非思考模式自然输出长度的可靠估计。这是 TNT 的核心假设和设计基础。

参考