Files
myWiki/concepts/appearance-bias-vla.md

39 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Appearance Bias in VLA"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["vla", "bias", "pretraining", "representation-learning"]
sources:
- "[[vla-jepa-2026]]"
---
# Appearance Bias in VLA Pretraining
Appearance Bias 是 VLA 像素级预训练目标中的系统性失败模式:模型学习的表示偏向视觉外观变化(纹理、光照、背景),而非动作相关的可控自由度。
## 表现
- 光照变化被编码为重要"特征"
- 背景纹理替换导致 latent action 大幅变化
- 相机角度偏移比对动作转移更显著地影响表示
- 用 VQ-VAE 等压缩机制仍无法完全消除——压缩空间仍保留大量外观信息
## 根因
像素空间的变化主要由外观因素主导,这些因素:
1. 方差高texture, illumination, clutter, viewpoint
2. 可控性低(与机器人动作弱相关)
3. 易预测(建模难度低)
因此模型自然地学习预测这些"低垂果实",而非真正的动作语义。
## JEPA 的修复
通过 latent space prediction 而非 pixel space predictionJEPA 目标天然不直接建模像素变化,迫使模型在语义层面抽象。
## 参考
- [[vla-jepa-2026]]
- [[latent-action-pretraining]]
- [[leakage-free-state-prediction]]