SidneyZhang/myWiki

Files

Sidney Zhang 6021dea160

20260625:很多新内容

2026-06-25 14:08:47 +08:00

1.2 KiB

Raw Blame History

title, created, updated, type, tags, sources

title

created

updated

type

tags

sources

Appearance Bias in VLA

2026-06-24

2026-06-24

concept

vla

bias

pretraining

representation-learning

Appearance Bias in VLA Pretraining

Appearance Bias 是 VLA 像素级预训练目标中的系统性失败模式：模型学习的表示偏向视觉外观变化（纹理、光照、背景），而非动作相关的可控自由度。

表现

光照变化被编码为重要"特征"
背景纹理替换导致 latent action 大幅变化
相机角度偏移比对动作转移更显著地影响表示
用 VQ-VAE 等压缩机制仍无法完全消除——压缩空间仍保留大量外观信息

根因

像素空间的变化主要由外观因素主导，这些因素：

方差高（texture, illumination, clutter, viewpoint）
可控性低（与机器人动作弱相关）
易预测（建模难度低）

因此模型自然地学习预测这些"低垂果实"，而非真正的动作语义。

JEPA 的修复

通过 latent space prediction 而非 pixel space prediction，JEPA 目标天然不直接建模像素变化，迫使模型在语义层面抽象。

参考