Files
myWiki/raw/papers/tang-lukv-2026.md

60 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction"
authors: ["Ziyao Tang", "Pengkun Jiao", "Xinhang Chen", "Wei Liu", "Shiyong Li", "Jingjing Chen"]
date: 2026-02-09
arxiv_id: "2602.08585v2"
categories: ["cs.LG", "cs.AI"]
venue: "ICML 2026"
affiliations: ["Fudan University", "Baidu Inc. (Baige AI Team)"]
paper_type: "conference"
---
# Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction
## 摘要
KV cache 的线性内存增长是大模型长上下文推理的核心瓶颈。现有 KV cache eviction 方法依赖瞬时启发式指标instantaneous heuristic metrics假设注意力分数在所有 head 中都是一致的重要性代理。然而,不同 attention head 在预测保真度predictive fidelity上存在异质性某些 head 侧重即时贡献另一些则捕捉长期效用long-horizon utility。本文提出 LU-KV 框架,将 head 级别预算分配建模为全局组合优化问题通过凸包松弛convex-hull relaxation和边际效用贪心求解器获得近优解并设计离线 profiling 协议支持实际部署。在 LongBench 和 RULER 上以 80% KV cache 压缩率实现最小性能损失。
## 核心贡献
1. 识别了启发式重要性指标与长视界边际效用之间的关键差距optimality gap
2. 将预算分配形式化为长期效用最大化问题,提出凸包松弛 + 边际效用贪心求解器
3. 设计了数据驱动的离线 profiling 协议,使理论优化可在实际推理中部署
4. 指标无关metric-agnostic可适配 SnapKV、KeyDiff、CAKE、KVZip 等多种 intra-head 评分方法
## 关键概念
- [[oracle-importance]]Oracle 重要性,基于未来解码窗口中 token 对输出向量的最大潜在贡献
- [[optimality-gap]]:启发式指标与 Oracle 指标之间的最优性差距
- [[long-horizon-utility]]:长视界效用,区别于瞬时注意力分数
- [[global-combinatorial-optimization]]:全局预算分配的组合优化形式化
- [[convex-hull-relaxation]]:通过 PAVA 等保序回归方法对离散损失序列做凸松弛
- [[marginal-utility]]:边际效用,用于驱动贪心分配策略
- [[offline-profiling]]:合成上下文 → Oracle 计算 → Profile 聚合的三阶段离线校准
## 实验结果
- LongBench80% 压缩率下LU-KV 在 Llama-3.1-8B、Mistral-7B、Qwen2.5-32B 上全面优于 Uniform、PyramidKV、AdaKV 等基线
- RULER在 4K-128K 扩展上下文窗口下保持检索鲁棒性
- 离线 profile 在不同任务间具有高度一致的迁移性transferability
- 可兼容 SnapKV、KeyDiff、CAKE、KVZip 等多种 intra-head 指标
## 方法框架
LU-KV 采用两阶段范式:
1. **Intra-head scoring**:使用任意启发式指标 π 对 token 评分排序
2. **Cross-head budget allocation**:通过全局组合优化确定每个 head 的最优预算 b_{,h}
核心分解:`Eviction Loss = Oracle Metric Loss + Optimality Gap Loss`
## 参考文献
- SnapKV (Li et al., 2024)
- H2O (Zhang et al., 2023)
- PyramidKV (Cai et al., 2024)
- AdaKV (Feng et al., 2026b)
- KeyDiff (Park et al., 2025)
- CriticalKV (Feng et al., 2025)
- KVZip (Kim et al., 2026)
- CAKE (Qin et al., 2025)