20260420:first commit

2026-04-20 11:42:41 +08:00
commit dd8345a6ea
45 changed files with 2366 additions and 0 deletions
--- a/concepts/prefill-as-a-service.md
+++ b/concepts/prefill-as-a-service.md
@@ -0,0 +1,59 @@
+---
+title: "Prefill-as-a-Service (PrfaaS)"
+created: 2026-04-19
+updated: 2026-04-19
+type: concept
+tags: [inference, system-design, architecture]
+sources: [raw/papers/qin-prfaas-cross-datacenter-2026.md]
+---
+
+# Prefill-as-a-Service (PrfaaS)
+
+**提出者:** Qin et al. (2026) · arXiv:2604.15039
+
+## 定义
+
+PrfaaS 是一种跨数据中心的 LLM 服务架构，通过选择性地将长上下文 prefill 卸载到独立的计算密集型集群，并通过商用以太网将 KVCache 传输到本地 decode 集群，实现 prefill 和 decode 容量的独立扩展。
+
+## 动机
+
+传统的 [[prefill-decode-disaggregation]] 架构虽然分离了计算密集型的 prefill 和内存密集型的 decode 阶段，但受限于 KVCache 的传输成本：
+- **Dense-attention 模型**：KVCache 体积巨大，需要低延迟 RDMA 网络
+- **混合注意力模型**：KVCache 大幅减小，但真实负载特性（突发、长度偏斜、带宽波动）仍使简单的外部化设计面临拥塞和低利用率问题
+
+## 架构设计
+
+### 核心组件
+1. **独立 Prefill 集群**：计算密集型，专门处理长上下文 prefill
+2. **本地 PD 集群**：接收 KVCache 后执行 decode
+3. **带宽感知调度器**：根据跨数据中心带宽波动动态调整卸载策略
+4. **缓存感知请求放置**：利用现有前缀缓存优化请求路由
+
+### 关键技术
+- **选择性卸载**：仅对长上下文请求进行跨数据中心 prefill 卸载
+- **KVCache 高效传输**：通过商用以太网（无需 RDMA）传输
+- **系统侧与模型侧协同**：结合模型 KV 效率优化与系统调度
+
+## 性能表现
+
+基于内部 1T 参数混合模型：
+- 吞吐量比同构 PD 部署高 **54%**
+- 吞吐量比朴素异构基线高 **32%**
+- 跨数据中心带宽消耗适度
+
+## 意义
+
+PrfaaS 解除了"异构加速器必须共享同一低延迟 RDMA fabric"的限制，使得 LLM 服务可以更灵活地部署在松散耦合的集群中，为云原生 LLM 服务提供了新的架构范式。
+
+## 开放问题
+
+- 如何自适应选择预填卸载的阈值？
+- PrfaaS 在多租户环境下的隔离与调度策略？
+- 对纯 dense-attention 模型的适用性边界？
+
+## 相关概念
+
+- [[qin-prfaas-cross-datacenter]] — 原始论文
+- [[prefill-decode-disaggregation]] — PD 分离架构
+- [[kvcache-transfer]] — KVCache 传输优化
+- [[hybrid-attention-models]] — 混合注意力架构