Files
myWiki/concepts/deepseek-r1.md

29 lines
814 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "DeepSeek-R1"
created: 2025-06-02
updated: 2025-06-02
type: concept
tags: [reasoning-model, llm, deepseek, placeholder]
sources: []
---
# DeepSeek-R1
> DeepSeek 发布的开源推理模型Guo et al., 2025通过强化学习激励推理能力在多个基准上达到领先水平。
## 核心特点
- 基于 RL 的推理能力训练(非 SFT
- 生成显式推理 tokenthinking tokens随后生成回复
- 主要基于单轮推理数据训练
## 在多轮推理中的局限
[[goru-one-pass-to-reason-2025]] 指出DeepSeek-R1 遵循行业惯例——推理 token 在后续轮次中被丢弃,导致多轮微调效率低下(需 N 遍前向传播)。
## 相关
- [[goru-one-pass-to-reason-2025|One-Pass to Reason]]
- [[multi-turn-reasoning]]
- [[visibility-constraint]]