20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/constant-kv-cache.md
+++ b/concepts/constant-kv-cache.md
@@ -0,0 +1,39 @@
+---
+title: "Constant KV Cache"
+created: 2026-06-24
+updated: 2026-06-24
+type: concept
+tags: ["kv-cache", "efficient-inference", "attention-mechanism"]
+sources:
+  - "[[unlimited-ocr-works-2026]]"
+---
+
+# Constant KV Cache
+
+Constant KV Cache 是 R-SWA 注意力机制的核心性质：KV cache 大小在全部解码过程中保持有界常数 Lm + n，不随输出长度 T 增长。
+
+## 定义
+
+$$C_{R\text{-}SWA}(T) = L_m + \min(n, T) \leq L_m + n$$
+
+其中 Lm 为前缀 token 数（固定），n 为滑动窗口宽度（默认 128）。
+
+## 与标准 MHA 的对比
+
+| 机制 | KV Cache 增长 | 无穷 T 时 |
+|------|-------------|----------|
+| MHA | O(T) 线性 | ∞ |
+| R-SWA | O(1) 常数 | Lm + n |
+
+Cache 压缩比：$\rho(T) = \frac{L_m + n}{L_m + T} \to 0$
+
+## 工程意义
+
+- GPU 显存恒定，不随输出长度增长
+- 推理速度（TPS）恒定（Flash Attention v3 核函数延迟稳定）
+- 使单次前向解析数十页成为可能
+
+## 参考
+- [[unlimited-ocr-works-2026]]
+- [[reference-sliding-window-attention]]
+- [[kv-cache]]