Files
myWiki/concepts/latent-action-pretraining.md

42 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Latent-Action Pretraining"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["vla", "pretraining", "robot-learning", "latent-representation"]
sources:
- "[[vla-jepa-2026]]"
---
# Latent-Action Pretraining
Latent-Action Pretraining 是从无标注视频学习 VLA 策略的预训练范式:先学习视频中的表示和转移结构,再适配到下游控制任务。
## 标准流程
1. 从视频数据学习 latent action 表示
2. 将 latent action 对齐到真实动作空间
3. 在控制数据上微调策略
## 当前方法的四类失败
VLA-JEPA 识别出四个系统性缺陷:
| 类型 | 原因 | 表现 |
|------|------|------|
| [[appearance-bias-vla|外观偏见]] | 像素级目标 | 学习纹理/光照变化而非动作语义 |
| 噪声运动 | 相机运动主导信号 | latent action 编码相机抖动 |
| [[information-leakage-vla|信息泄漏]] | 未来作为输入 | latent action 坍缩为编码未来 |
| 多阶段脆弱性 | 流水线复杂 | 阶段间不一致,工程负担重 |
## VLA-JEPA 的修复
用 JEPA 范式替代像素预测leakage-free state prediction + latent space alignment。
## 参考
- [[vla-jepa-2026]]
- [[vla-vision-language-action]]
- [[leakage-free-state-prediction]]
- [[appearance-bias-vla]]
- [[information-leakage-vla]]