20260514:增加新内容

2026-05-14 13:54:52 +08:00
parent 56c4d3ef7c
commit b116710e4c
294 changed files with 10682 additions and 255 deletions
--- a/concepts/token-efficiency.md
+++ b/concepts/token-efficiency.md
@@ -0,0 +1,55 @@
+---
+title: "Token 效率 (Token Efficiency)"
+domain: "Multimodal AI / Efficiency"
+tags: [token-efficiency, visual-token, compression]
+sources: [[thinking-with-visual-primitives]]
+---
+
+# Token 效率 (Token Efficiency)
+
+> 以更少的视觉 token 实现相当或更强的推理能力——「Thinking with Visual Primitives」的核心架构优势。
+
+## 动机
+
+前沿多模态模型普遍依赖大量视觉 token 来弥补视觉缺陷：
+- GPT-5.4: ~740 tokens/image
+- Claude-Sonnet-4.6: ~870 tokens/image
+- Gemini-3-Flash: ~1,100 tokens/image
+
+高 token 预算意味着：
+- 更长的推理延迟
+- 更大的 KV cache 内存占用
+- 更高的 API 成本
+
+## DeepSeek 的方案
+
+```
+756×756 图像
+  → Patch Embedding (14×14): 2,916 tokens
+    → 3×3 空间压缩: 324 visual tokens
+      → CSA 压缩: 81 KV entries (~90 in KV cache)
+```
+
+**总压缩比：7056×**
+
+## 性能对比
+
+| 模型 | KV Entries ≈ | CountQA EM | SpatialMQA |
+|------|-------------|------------|------------|
+| **Ours** | **~90** | **66.1** | **69.4** |
+| GPT-5.4 | ~740 | 48.3 | 61.9 |
+| Gemini-3-Flash | ~1,100 | 34.8 | 58.2 |
+
+> 以 1/8 到 1/12 的 token 预算，实现更优或相当的性能。
+
+## 关键使能技术
+
+- [[compressed-sparse-attention|压缩稀疏注意力]] — KV cache 层面的压缩
+- [[deepseek-vit|DeepSeek-ViT]] — 3×3 空间 token 压缩
+- [[visual-primitives|视觉原语]] — 每个 token 信息密度更高
+
+## 相关概念
+
+- [[compressed-sparse-attention|压缩稀疏注意力]] — 核心压缩机制
+- [[deepseek-vit|DeepSeek-ViT]] — 视觉编码器
+- [[visual-primitives|视觉原语]] — 信息密度提升