Files
myWiki/concepts/constant-kv-cache.md

40 lines
1.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Constant KV Cache"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["kv-cache", "efficient-inference", "attention-mechanism"]
sources:
- "[[unlimited-ocr-works-2026]]"
---
# Constant KV Cache
Constant KV Cache 是 R-SWA 注意力机制的核心性质KV cache 大小在全部解码过程中保持有界常数 Lm + n不随输出长度 T 增长。
## 定义
$$C_{R\text{-}SWA}(T) = L_m + \min(n, T) \leq L_m + n$$
其中 Lm 为前缀 token 数固定n 为滑动窗口宽度(默认 128
## 与标准 MHA 的对比
| 机制 | KV Cache 增长 | 无穷 T 时 |
|------|-------------|----------|
| MHA | O(T) 线性 | ∞ |
| R-SWA | O(1) 常数 | Lm + n |
Cache 压缩比:$\rho(T) = \frac{L_m + n}{L_m + T} \to 0$
## 工程意义
- GPU 显存恒定,不随输出长度增长
- 推理速度TPS恒定Flash Attention v3 核函数延迟稳定)
- 使单次前向解析数十页成为可能
## 参考
- [[unlimited-ocr-works-2026]]
- [[reference-sliding-window-attention]]
- [[kv-cache]]