20260617:目前有914 页
This commit is contained in:
52
concepts/token-wise-routing.md
Normal file
52
concepts/token-wise-routing.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: "逐Token路由 (Token-Wise Routing)"
|
||||
created: 2026-06-17
|
||||
updated: 2026-06-17
|
||||
type: concept
|
||||
tags: [reasoning, architecture, routing]
|
||||
sources: [raw/papers/zhang-tarpo-2026.md]
|
||||
confidence: high
|
||||
---
|
||||
|
||||
# 逐 Token 路由 (Token-Wise Routing)
|
||||
|
||||
逐 token 路由是 [[tarpo|TARPO]] 的核心机制:在每一个 token 生成步骤,模型自主决定下一个推理单元是 [[hard-token|离散 token]] 还是 [[soft-token|连续 latent vector]]。
|
||||
|
||||
## 设计原则
|
||||
|
||||
与传统的**固定步长**或**启发式切换**不同,逐 token 路由的粒度是最细的——每一步都是决策点:
|
||||
|
||||
```
|
||||
for t in 1..T:
|
||||
h_t = LLM(h_{t-1}, u_{t-1})
|
||||
d_t ~ rho(h_t) # 采样路由决策:hard 或 soft
|
||||
if d_t == hard:
|
||||
v_t ~ pi(h_t) # 从词表采样离散 token
|
||||
u_t = E(v_t)
|
||||
else:
|
||||
u_t = soft_mix(h_t) # 构造连续 latent
|
||||
```
|
||||
|
||||
## 关键要素
|
||||
|
||||
### 路由策略
|
||||
`ρ_θ(d_t | h_t)` — 一个轻量级分类器,从当前隐藏状态预测二元路由决策
|
||||
|
||||
### 动作空间
|
||||
`A = {soft} ∪ ({hard} × V)` — 统一了路由选择和 token 采样
|
||||
|
||||
### 探索机制
|
||||
通过从路由策略中**采样**而非取 argmax,保证了推理模式级别的探索
|
||||
|
||||
## 优势
|
||||
|
||||
1. **细粒度控制**:每步独立决策,而非预设固定模式
|
||||
2. **自适应**:学习何时需要表达力(soft)vs 随机性(hard)
|
||||
3. **可学习**:完全通过 RL 优化,无需启发式或监督信号
|
||||
|
||||
## 参考
|
||||
|
||||
- [[action-routing-policy|动作路由策略]]
|
||||
- [[action-head-router|动作头路由器]]
|
||||
- [[tarpo|TARPO]]
|
||||
- [[hybrid-reasoning|混合推理]]
|
||||
Reference in New Issue
Block a user