Files
myWiki/concepts/token-wise-routing.md

53 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "逐Token路由 (Token-Wise Routing)"
created: 2026-06-17
updated: 2026-06-17
type: concept
tags: [reasoning, architecture, routing]
sources: [raw/papers/zhang-tarpo-2026.md]
confidence: high
---
# 逐 Token 路由 (Token-Wise Routing)
逐 token 路由是 [[tarpo|TARPO]] 的核心机制:在每一个 token 生成步骤,模型自主决定下一个推理单元是 [[hard-token|离散 token]] 还是 [[soft-token|连续 latent vector]]。
## 设计原则
与传统的**固定步长**或**启发式切换**不同,逐 token 路由的粒度是最细的——每一步都是决策点:
```
for t in 1..T:
h_t = LLM(h_{t-1}, u_{t-1})
d_t ~ rho(h_t) # 采样路由决策hard 或 soft
if d_t == hard:
v_t ~ pi(h_t) # 从词表采样离散 token
u_t = E(v_t)
else:
u_t = soft_mix(h_t) # 构造连续 latent
```
## 关键要素
### 路由策略
`ρ_θ(d_t | h_t)` — 一个轻量级分类器,从当前隐藏状态预测二元路由决策
### 动作空间
`A = {soft} ({hard} × V)` — 统一了路由选择和 token 采样
### 探索机制
通过从路由策略中**采样**而非取 argmax保证了推理模式级别的探索
## 优势
1. **细粒度控制**:每步独立决策,而非预设固定模式
2. **自适应**学习何时需要表达力softvs 随机性hard
3. **可学习**:完全通过 RL 优化,无需启发式或监督信号
## 参考
- [[action-routing-policy|动作路由策略]]
- [[action-head-router|动作头路由器]]
- [[tarpo|TARPO]]
- [[hybrid-reasoning|混合推理]]