20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/task-invariant-representation.md
+++ b/concepts/task-invariant-representation.md
@@ -0,0 +1,51 @@
+---
+title: "任务不变表征 (Task-Invariant Representation)"
+created: 2026-06-17
+updated: 2026-06-17
+type: concept
+tags: [representation-learning, multi-task, transfer-learning, reinforcement-learning]
+sources: [raw/papers/naveen-repmt-sac-2026.md]
+confidence: high
+---
+
+# 任务不变表征 (Task-Invariant Representation)
+
+任务不变表征是[[repmt-sac|RepMT-SAC]]的核心——捕获**所有任务共享的动力学结构**，与奖励函数无关。
+
+## 形式
+
+在[[spectral-mdp-decomposition|谱 MDP 分解]]中：
+
+```
+Q(s,a;τ) = ⟨φ(s,a), w(τ)⟩
+```
+
+- `φ(s,a)`：任务不变 → 捕获 P(s'|s,a) 的结构
+- `w(τ)`：任务特定 → 编码 r(s,a,τ) 的信息
+
+## 为什么可行
+
+多任务 MDP 设置假设**所有任务共享动力学 P 和状态-动作空间**，仅在奖励函数上不同。例如四旋翼在所有轨迹上的物理动力学不变。
+
+## 学习
+
+使用对比式条件密度估计学习 φ 和辅助表征 µ(s')：
+
+```
+P(s'|s,a) ≈ exp⟨φ(s,a), µ(s')⟩ / Z
+```
+
+学到的 φ 允许任何任务的 Q 通过线性组合 w(τ)⊤ φ(s,a) 表达。
+
+## 冻结的优势
+
+下游适应时冻结 φ：
+- 新任务仅需学习 w(τ_new)（低维参数）
+- Q 学习退化为线性回归（极稳定）
+- 少样本即可适应
+
+## 参考
+
+- [[spectral-mdp-decomposition|谱 MDP 分解]]
+- [[task-conditioned-policy|任务条件策略]]
+- [[rep-mt-sac|RepMT-SAC]]