Files
myWiki/concepts/steering-vector.md

35 lines
864 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Steering Vector"
created: 2026-06-01
updated: 2026-06-01
type: concept
tags: [steering, interpretability]
sources: [raw/papers/xu-why-steering-works-2026.md]
confidence: medium
---
# Steering Vector导向向量
> 占位符页面 — 待扩展
## 概述
Steering Vector 是从模型激活中提取的方向向量,用于在推理时通过加法干预 ([[activation-steering]]) 控制模型行为。
## 提取方法
- **DiffMean**Marks & Tegmark, 2023取正负概念激活差值的均值
- **SFT-based**:通过监督训练学习最优方向
- **RePS**:基于偏好信号的训练
## 在统一框架中
导向向量等价于偏置更新 $\Delta b$$h_{i+1} = W h_i + (b + m \Delta b)$
## 相关概念
- [[activation-steering]]
- [[linear-representation-hypothesis]]
- [[dynamic-weight-updates]]
- [[xu-why-steering-works]]