20260420:first commit

2026-04-20 11:42:41 +08:00
commit dd8345a6ea
45 changed files with 2366 additions and 0 deletions
--- a/concepts/kvcache-transfer.md
+++ b/concepts/kvcache-transfer.md
@@ -0,0 +1,38 @@
+---
+title: "KVCache 传输与优化"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [inference, system-design, performance]
+sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md]
+---
+
+# KVCache 传输与优化 (KVCache Transfer)
+
+## 定义
+
+KVCache 是 LLM 推理过程中缓存的 Key-Value 状态，用于避免重复计算。KVCache 传输指在分离式推理架构中将 prefill 阶段生成的 KVCache 移动到 decode 节点的过程。
+
+## 传输瓶颈
+
+- **体积巨大**：Dense-attention 模型的 KVCache 大小与序列长度和模型参数量成正比
+- **带宽要求**：传统架构依赖 RDMA 等低延迟高带宽网络
+- **延迟敏感**：传输延迟直接影响 TTFT（Time to First Token）
+
+## 优化方向
+
+### 模型侧
+- **混合注意力架构**：通过结构化状态空间或线性注意力减少 KVCache 大小
+- **KVCache 压缩**：量化、稀疏化或蒸馏技术
+- **前缀缓存共享**：多请求共享公共前缀的 KVCache
+
+### 系统侧
+- **选择性传输**：仅传输必要的 KVCache 层或 token
+- **带宽感知调度**：根据网络状态动态调整传输策略
+- **PrfaaS 架构**：结合模型效率与系统调度，实现跨数据中心传输
+
+## 相关概念
+
+- [[prefill-as-a-service]] — PrfaaS 架构中的 KVCache 传输
+- [[prefill-decode-disaggregation]] — PD 分离架构
+- [[inference-optimization]] — 推理优化技术