20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/concepts/model-steering.md
+++ b/concepts/model-steering.md
@@ -0,0 +1,39 @@
+---
+title: "Model Steering"
+created: 2026-06-01
+updated: 2026-06-01
+type: concept
+tags: [steering, controllability, llm]
+sources: [raw/papers/xu-why-steering-works-2026.md]
+---
+
+# Model Steering（模型导向控制）
+
+> 占位符页面 — 关于 LLM 导向控制的更广泛文献
+
+## 概述
+
+Model Steering 泛指在推理时通过修改模型内部表示或参数来控制 LLM 行为的技术，包括但不限于：
+
+- **激活导向** ([[activation-steering]])：向隐藏状态添加方向向量
+- **参数干预**：局部权重微调、LoRA ([[lora]]) 适配
+- **推理时对齐**：通过系统提示或上下文控制
+
+## 统一视角
+
+Xu et al. (2026) 的 [[dynamic-weight-updates]] 框架将所有方法统一为动态权重更新，揭示了它们共享的 [[preference-utility-analysis|preference–utility 折衷]] 规律。
+
+## 核心挑战
+
+- **偏好-效用折衷**：更强控制 → 更高偏好 + 更低效用
+- **方向选择**：如何找到最优的 $\Delta W$ / $\Delta b$
+- **强度调节**：$m$ 的最佳取值依赖于具体任务
+
+## 相关概念
+
+- [[activation-steering]]
+- [[lora]]
+- [[dynamic-weight-updates]]
+- [[steering-vector]]
+- [[split-steering]]
+- [[controlled-text-generation]]