20260617:目前有914 页

This commit is contained in:
2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions

View File

@@ -0,0 +1,34 @@
---
title: "Steering Vector"
created: 2026-06-01
updated: 2026-06-01
type: concept
tags: [steering, interpretability]
sources: [raw/papers/xu-why-steering-works-2026.md]
confidence: medium
---
# Steering Vector导向向量
> 占位符页面 — 待扩展
## 概述
Steering Vector 是从模型激活中提取的方向向量,用于在推理时通过加法干预 ([[activation-steering]]) 控制模型行为。
## 提取方法
- **DiffMean**Marks & Tegmark, 2023取正负概念激活差值的均值
- **SFT-based**:通过监督训练学习最优方向
- **RePS**:基于偏好信号的训练
## 在统一框架中
导向向量等价于偏置更新 $\Delta b$$h_{i+1} = W h_i + (b + m \Delta b)$
## 相关概念
- [[activation-steering]]
- [[linear-representation-hypothesis]]
- [[dynamic-weight-updates]]
- [[xu-why-steering-works]]