20260617:目前有914 页
This commit is contained in:
28
concepts/deepseek-r1.md
Normal file
28
concepts/deepseek-r1.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
title: "DeepSeek-R1"
|
||||
created: 2025-06-02
|
||||
updated: 2025-06-02
|
||||
type: concept
|
||||
tags: [reasoning-model, llm, deepseek, placeholder]
|
||||
sources: []
|
||||
---
|
||||
|
||||
# DeepSeek-R1
|
||||
|
||||
> DeepSeek 发布的开源推理模型(Guo et al., 2025),通过强化学习激励推理能力,在多个基准上达到领先水平。
|
||||
|
||||
## 核心特点
|
||||
|
||||
- 基于 RL 的推理能力训练(非 SFT)
|
||||
- 生成显式推理 token(thinking tokens),随后生成回复
|
||||
- 主要基于单轮推理数据训练
|
||||
|
||||
## 在多轮推理中的局限
|
||||
|
||||
[[goru-one-pass-to-reason-2025]] 指出,DeepSeek-R1 遵循行业惯例——推理 token 在后续轮次中被丢弃,导致多轮微调效率低下(需 N 遍前向传播)。
|
||||
|
||||
## 相关
|
||||
|
||||
- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]
|
||||
- [[multi-turn-reasoning]]
|
||||
- [[visibility-constraint]]
|
||||
Reference in New Issue
Block a user