20260617:目前有914 页
This commit is contained in:
39
concepts/model-steering.md
Normal file
39
concepts/model-steering.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: "Model Steering"
|
||||
created: 2026-06-01
|
||||
updated: 2026-06-01
|
||||
type: concept
|
||||
tags: [steering, controllability, llm]
|
||||
sources: [raw/papers/xu-why-steering-works-2026.md]
|
||||
---
|
||||
|
||||
# Model Steering(模型导向控制)
|
||||
|
||||
> 占位符页面 — 关于 LLM 导向控制的更广泛文献
|
||||
|
||||
## 概述
|
||||
|
||||
Model Steering 泛指在推理时通过修改模型内部表示或参数来控制 LLM 行为的技术,包括但不限于:
|
||||
|
||||
- **激活导向** ([[activation-steering]]):向隐藏状态添加方向向量
|
||||
- **参数干预**:局部权重微调、LoRA ([[lora]]) 适配
|
||||
- **推理时对齐**:通过系统提示或上下文控制
|
||||
|
||||
## 统一视角
|
||||
|
||||
Xu et al. (2026) 的 [[dynamic-weight-updates]] 框架将所有方法统一为动态权重更新,揭示了它们共享的 [[preference-utility-analysis|preference–utility 折衷]] 规律。
|
||||
|
||||
## 核心挑战
|
||||
|
||||
- **偏好-效用折衷**:更强控制 → 更高偏好 + 更低效用
|
||||
- **方向选择**:如何找到最优的 $\Delta W$ / $\Delta b$
|
||||
- **强度调节**:$m$ 的最佳取值依赖于具体任务
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[activation-steering]]
|
||||
- [[lora]]
|
||||
- [[dynamic-weight-updates]]
|
||||
- [[steering-vector]]
|
||||
- [[split-steering]]
|
||||
- [[controlled-text-generation]]
|
||||
Reference in New Issue
Block a user