20260429:一些新东西

2026-04-29 16:28:13 +08:00
parent 0b1535dfaf
commit 56c4d3ef7c
70 changed files with 2798 additions and 3 deletions
--- a/concepts/compressed-sparse-attention.md
+++ b/concepts/compressed-sparse-attention.md
@@ -0,0 +1,50 @@
+---
+title: "Compressed Sparse Attention (CSA)"
+domain: "Deep Learning / Attention Mechanisms"
+tags: [attention, long-context, transformer, architecture]
+sources: [[deepseek-v4-million-token-context]]
+---
+
+# Compressed Sparse Attention (CSA)
+
+> **类型**: Concept (Tier 1 — Core)
+> **来源**: [[deepseek-v4-million-token-context]]
+
+## 定义
+
+CSA（Compressed Sparse Attention）是 DeepSeek-V4 引入的一种混合注意力机制，其核心思想是先将 KV cache 沿序列维度进行压缩，再在压缩后的表示上执行 DeepSeek Sparse Attention（DSA），从而大幅降低长上下文下的计算和存储开销。
+
+## 核心机制
+
+### 1. KV Cache 压缩
+- 对 Key 和 Value 矩阵沿序列维度进行压缩，通过**闪电索引器（Lightning Indexer）**选择性地保留最相关的 KV 条目
+- 压缩后的 KV cache 大小相比原始表示减少数个数量级
+
+### 2. 稀疏注意力
+- 在压缩后的 KV 上执行 DeepSeek Sparse Attention（DSA）
+- 结合滑动窗口（Sliding Window）机制，确保局部上下文不被丢失
+- 使用 Multi-Query Attention 变体（共享 Key-Value）
+
+### 3. 效率分析
+- 相比 BF16 GQA8 基线，4.3 层 KV cache 仅约 2%（1M 上下文）
+- 注意力计算在索引器中以 FP4 精度执行，进一步加速
+
+## 与 HCA 的关系
+
+CSA 与 [[heavily-compressed-attention]]（HCA）构成 DeepSeek-V4 的 [[hybrid-attention-architecture]]：
+- **CSA**：中等压缩 + 稀疏注意力（保留更多局部信息）
+- **HCA**：激进压缩 + 密集注意力（最大化全局效率）
+
+## 数学原理
+
+给定输入序列长度 L、压缩比 r，CSA 将 KV 从 L × d 压缩至 L/r × d，使得注意力复杂度从 O(L²d) 降至 O(L²/r² · d)。
+
+## 相关概念
+
+- [[heavily-compressed-attention]] — HCA 高强度压缩注意力
+- [[hybrid-attention-architecture]] — 混合注意力架构
+- [[million-token-context]] — 百万 Token 上下文
+
+---
+
+*Last Updated: 2026-04-27*