20260625:很多新内容

2026-06-25 14:08:47 +08:00
parent 91fac5b6fc
commit 6021dea160
375 changed files with 19263 additions and 251 deletions
--- a/concepts/information-leakage-vla.md
+++ b/concepts/information-leakage-vla.md
@@ -0,0 +1,36 @@
+---
+title: "Information Leakage in VLA"
+created: 2026-06-24
+updated: 2026-06-24
+type: concept
+tags: ["vla", "pretraining", "information-leakage", "shortcut-learning"]
+sources:
+  - "[[vla-jepa-2026]]"
+---
+
+# Information Leakage in VLA Pretraining
+
+Information Leakage 是 VLA latent-action 预训练中的关键失败模式：当训练允许未来信息进入 latent action 编码器时，模型学到的是"编码未来"而非"捕获转移动态"的捷径。
+
+## 发生机制
+
+传统 latent-action 方法将当前和未来观察同时馈入同一模块：
+- I_t + I_{t+1} → latent action → 重建/预测目标
+- latent action 可以直接编码 I_{t+1} 的信息
+- 学到的"动作"语义空洞：能匹配训练损失，但不包含控制所需的有意义转移因子
+
+## VLA-JEPA 的消除
+
+通过严格的 architecture constraint：
+- Target encoder 从未来帧产生 latent target → stop-gradient
+- Student 仅见当前 → 未来不可访问
+- Student 必须在"看不见未来"的条件下预测 target
+
+## 与其他泄漏问题的对比
+
+不同于 NLP 中的 token 泄漏或 CV 中的 augmentation 泄漏，VLA 的信息泄漏特指**时序未来信息**污染了应编码动态的表示。
+
+## 参考
+- [[vla-jepa-2026]]
+- [[latent-action-pretraining]]
+- [[leakage-free-state-prediction]]