Files
myWiki/concepts/latent-world-model.md

39 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Latent World Model (Robotics)"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["world-model", "jepa", "robot-learning", "latent-representation"]
sources:
- "[[vla-jepa-2026]]"
---
# Latent World Model (Embodied)
Latent World Model 是 VLA-JEPA 中的世界模型组件,基于 JEPA 范式在 latent space 中建模状态转移动态。
## 架构
- **Target Encoder**V-JEPA2frozen从未来帧产生 latent world state targets
- **Predictor**Autoregressive Transformer (12 层, 8 注意力头, 2048-dim)
- **注意力**单时间步内双向K 个 latent action token + N 个 image latent token跨时间步因果
## 训练目标
$$\mathcal{L}_{WM} = \sum_{k=1}^{T} \mathbb{E}_{s_{t_k} \sim F(\cdot)} (\hat{s}_{t_k} - s_{t_k})$$
Target encoder F(·) 提供 ground-truth world statepredictor 学习预测。
可解释为 ELBO 最大化:
$$\log p(s_{1:T} | z_{0:T-1}) \geq \sum \mathbb{E}[\log p_\theta(\hat{s} | s)] - D_{KL}(F \| p_\theta^{WM})$$
## 与通用 World Model 的区别
不同于 Dreamer 等 pixel-space world modelLatent World Model 在语义空间运行,天然过滤像素噪声。
## 参考
- [[vla-jepa-2026]]
- [[jepa]]
- [[world-model-lecun]]
- [[leakage-free-state-prediction]]