20260625:很多新内容
This commit is contained in:
41
concepts/latent-action-pretraining.md
Normal file
41
concepts/latent-action-pretraining.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "Latent-Action Pretraining"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["vla", "pretraining", "robot-learning", "latent-representation"]
|
||||
sources:
|
||||
- "[[vla-jepa-2026]]"
|
||||
---
|
||||
|
||||
# Latent-Action Pretraining
|
||||
|
||||
Latent-Action Pretraining 是从无标注视频学习 VLA 策略的预训练范式:先学习视频中的表示和转移结构,再适配到下游控制任务。
|
||||
|
||||
## 标准流程
|
||||
|
||||
1. 从视频数据学习 latent action 表示
|
||||
2. 将 latent action 对齐到真实动作空间
|
||||
3. 在控制数据上微调策略
|
||||
|
||||
## 当前方法的四类失败
|
||||
|
||||
VLA-JEPA 识别出四个系统性缺陷:
|
||||
|
||||
| 类型 | 原因 | 表现 |
|
||||
|------|------|------|
|
||||
| [[appearance-bias-vla|外观偏见]] | 像素级目标 | 学习纹理/光照变化而非动作语义 |
|
||||
| 噪声运动 | 相机运动主导信号 | latent action 编码相机抖动 |
|
||||
| [[information-leakage-vla|信息泄漏]] | 未来作为输入 | latent action 坍缩为编码未来 |
|
||||
| 多阶段脆弱性 | 流水线复杂 | 阶段间不一致,工程负担重 |
|
||||
|
||||
## VLA-JEPA 的修复
|
||||
|
||||
用 JEPA 范式替代像素预测:leakage-free state prediction + latent space alignment。
|
||||
|
||||
## 参考
|
||||
- [[vla-jepa-2026]]
|
||||
- [[vla-vision-language-action]]
|
||||
- [[leakage-free-state-prediction]]
|
||||
- [[appearance-bias-vla]]
|
||||
- [[information-leakage-vla]]
|
||||
Reference in New Issue
Block a user