20260617:目前有914 页
This commit is contained in:
34
concepts/steering-vector.md
Normal file
34
concepts/steering-vector.md
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
title: "Steering Vector"
|
||||
created: 2026-06-01
|
||||
updated: 2026-06-01
|
||||
type: concept
|
||||
tags: [steering, interpretability]
|
||||
sources: [raw/papers/xu-why-steering-works-2026.md]
|
||||
confidence: medium
|
||||
---
|
||||
|
||||
# Steering Vector(导向向量)
|
||||
|
||||
> 占位符页面 — 待扩展
|
||||
|
||||
## 概述
|
||||
|
||||
Steering Vector 是从模型激活中提取的方向向量,用于在推理时通过加法干预 ([[activation-steering]]) 控制模型行为。
|
||||
|
||||
## 提取方法
|
||||
|
||||
- **DiffMean**(Marks & Tegmark, 2023):取正负概念激活差值的均值
|
||||
- **SFT-based**:通过监督训练学习最优方向
|
||||
- **RePS**:基于偏好信号的训练
|
||||
|
||||
## 在统一框架中
|
||||
|
||||
导向向量等价于偏置更新 $\Delta b$:$h_{i+1} = W h_i + (b + m \Delta b)$
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[activation-steering]]
|
||||
- [[linear-representation-hypothesis]]
|
||||
- [[dynamic-weight-updates]]
|
||||
- [[xu-why-steering-works]]
|
||||
Reference in New Issue
Block a user