20260625:很多新内容
This commit is contained in:
62
concepts/delta-rule.md
Normal file
62
concepts/delta-rule.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "Delta Rule"
|
||||
created: 2026-06-18
|
||||
updated: 2026-06-18
|
||||
type: concept
|
||||
tags: ["rnn", "gradient-based-memory", "fast-weights"]
|
||||
sources: ["https://arxiv.org/abs/2503.14456"]
|
||||
---
|
||||
|
||||
# Delta Rule
|
||||
|
||||
## 定义
|
||||
|
||||
Delta Rule(Delta 规则)是一种基于**梯度下降**的序列记忆更新机制,源于 Widrow-Hoff 的经典学习规则(1960 年),被 DeltaNet (Schlag et al., 2021) 引入现代序列建模。核心理念:将记忆写入视为一个在线优化问题——对记忆矩阵 M 执行梯度下降以最小化预测误差。
|
||||
|
||||
## 基础形式
|
||||
|
||||
```
|
||||
S_t = S_{t-1} - α_t · ∇l(S_{t-1}, k_t, v_t)
|
||||
```
|
||||
|
||||
其中:
|
||||
- S_t 是可学习的矩阵状态(记忆)
|
||||
- k_t 是 query/key,v_t 是 value
|
||||
- α_t 是学习率(通常为标量)
|
||||
- l 是损失函数(通常为均方误差)
|
||||
|
||||
## 直觉
|
||||
|
||||
Delta 规则将序列处理重新理解为**在线梯度下降**:
|
||||
|
||||
1. 遇到输入对 (k_t, v_t)
|
||||
2. 检查当前记忆 S_{t-1} 能否"回忆起" k_t 关联的信息
|
||||
3. 计算预测误差 → 梯度
|
||||
4. 沿负梯度方向更新 S_{t-1} → S_t
|
||||
|
||||
这使模型天然具备**联想记忆(associative memory)**能力。
|
||||
|
||||
## 从 DeltaNet 到 RWKV-7
|
||||
|
||||
| 属性 | DeltaNet | RWKV-7 |
|
||||
|------|---------|--------|
|
||||
| 学习率 | 标量 α | 向量 a_t([[in-context-learning-rate]]) |
|
||||
| 门控 | 无 | 向量值门控 |
|
||||
| Key 解耦 | k_t 同时用于 ± | k_remove ≠ k_add |
|
||||
| 衰减 | 固定 | 动态 w_t |
|
||||
|
||||
RWKV-7 的 [[generalized-delta-rule]] 在保持 Delta 规则核心(梯度下降式记忆更新)的同时,扩展了三个关键自由度。
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[generalized-delta-rule]] — RWKV-7 的扩展版本
|
||||
- [[in-context-learning-rate]] — 标量 → 向量的关键升级
|
||||
- [[vector-valued-gating]] — 逐通道选择性门控
|
||||
- [[dynamic-state-evolution]] — Delta 规则 + 动态衰减
|
||||
- [[peng-rwkv7|RWKV-7 论文]]
|
||||
|
||||
## 参考
|
||||
|
||||
- DeltaNet (Schlag et al., 2021)
|
||||
- Gated DeltaNet (Yang et al., 2024)
|
||||
- [[peng-rwkv7|RWKV-7 "Goose"]] (Peng et al., 2025)
|
||||
Reference in New Issue
Block a user