20260625:很多新内容
This commit is contained in:
38
concepts/appearance-bias-vla.md
Normal file
38
concepts/appearance-bias-vla.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: "Appearance Bias in VLA"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["vla", "bias", "pretraining", "representation-learning"]
|
||||
sources:
|
||||
- "[[vla-jepa-2026]]"
|
||||
---
|
||||
|
||||
# Appearance Bias in VLA Pretraining
|
||||
|
||||
Appearance Bias 是 VLA 像素级预训练目标中的系统性失败模式:模型学习的表示偏向视觉外观变化(纹理、光照、背景),而非动作相关的可控自由度。
|
||||
|
||||
## 表现
|
||||
|
||||
- 光照变化被编码为重要"特征"
|
||||
- 背景纹理替换导致 latent action 大幅变化
|
||||
- 相机角度偏移比对动作转移更显著地影响表示
|
||||
- 用 VQ-VAE 等压缩机制仍无法完全消除——压缩空间仍保留大量外观信息
|
||||
|
||||
## 根因
|
||||
|
||||
像素空间的变化主要由外观因素主导,这些因素:
|
||||
1. 方差高(texture, illumination, clutter, viewpoint)
|
||||
2. 可控性低(与机器人动作弱相关)
|
||||
3. 易预测(建模难度低)
|
||||
|
||||
因此模型自然地学习预测这些"低垂果实",而非真正的动作语义。
|
||||
|
||||
## JEPA 的修复
|
||||
|
||||
通过 latent space prediction 而非 pixel space prediction,JEPA 目标天然不直接建模像素变化,迫使模型在语义层面抽象。
|
||||
|
||||
## 参考
|
||||
- [[vla-jepa-2026]]
|
||||
- [[latent-action-pretraining]]
|
||||
- [[leakage-free-state-prediction]]
|
||||
Reference in New Issue
Block a user