20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/deepseek-r1.md
+++ b/concepts/deepseek-r1.md
@@ -0,0 +1,28 @@
+---
+title: "DeepSeek-R1"
+created: 2025-06-02
+updated: 2025-06-02
+type: concept
+tags: [reasoning-model, llm, deepseek, placeholder]
+sources: []
+---
+
+# DeepSeek-R1
+
+> DeepSeek 发布的开源推理模型（Guo et al., 2025），通过强化学习激励推理能力，在多个基准上达到领先水平。
+
+## 核心特点
+
+- 基于 RL 的推理能力训练（非 SFT）
+- 生成显式推理 token（thinking tokens），随后生成回复
+- 主要基于单轮推理数据训练
+
+## 在多轮推理中的局限
+
+[[goru-one-pass-to-reason-2025]] 指出，DeepSeek-R1 遵循行业惯例——推理 token 在后续轮次中被丢弃，导致多轮微调效率低下（需 N 遍前向传播）。
+
+## 相关
+
+- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]
+- [[multi-turn-reasoning]]
+- [[visibility-constraint]]