20260514:增加新内容

2026-05-14 13:54:52 +08:00
parent 56c4d3ef7c
commit b116710e4c
294 changed files with 10682 additions and 255 deletions
--- a/concepts/deepseek-v4-flash.md
+++ b/concepts/deepseek-v4-flash.md
@@ -0,0 +1,27 @@
+---
+title: "DeepSeek-V4-Flash"
+domain: "Deep Learning / LLM"
+tags: [deepseek, llm, moe, backbone]
+sources: [[thinking-with-visual-primitives]], [[deepseek-v4-million-token-context]]
+---
+
+# DeepSeek-V4-Flash
+
+> 「Thinking with Visual Primitives」的语言骨干模型：284B 总参数 / 13B 激活参数的 MoE 架构。
+
+## 角色
+
+在视觉原语框架中，DeepSeek-V4-Flash 作为 LLM backbone，接收来自 [[deepseek-vit|DeepSeek-ViT]] 的视觉 token 和语言指令，生成交织视觉原语的思维链和最终响应。
+
+## 关键特性
+
+- [[mixture-of-experts|混合专家模型]] (MoE) 架构
+- 内置 [[compressed-sparse-attention|压缩稀疏注意力]] (CSA) 机制——这是实现极致 token 效率的关键
+- 支持百万 token 级长上下文
+- 在 pretraining 阶段使用 64K 序列长度 (FP8)，post-training 扩展到 256K
+
+## 相关概念
+
+- [[deepseek-vit|DeepSeek-ViT]] — 视觉编码器
+- [[compressed-sparse-attention|压缩稀疏注意力]] — KV cache 压缩
+- [[mixture-of-experts|混合专家模型]] — 参数效率架构