Files
myWiki/concepts/adaptive-computation-time.md

47 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Adaptive Computation Time (ACT)"
created: 2026-05-15
updated: 2026-05-15
type: concept
tags: [neural-architecture, efficiency, computation]
sources: [raw/papers/darlow-ctm-2025.md]
---
# Adaptive Computation Time (ACT)
**Adaptive Computation Time** 是一类技术,允许神经网络根据输入难度动态调整计算量。
## 经典方案
### ACT (Graves, 2016)
- 引入可学习的 halting 单元
- 在每个循环步骤输出 halting 概率
- 当累积 halting 概率超过 1ε 时停止
- 需要 "ponder cost" 正则化项鼓励效率
### PonderNet (Banino et al., 2021)
- 将 halting 概率建模为几何分布
- 训练时从分布采样步数
- 推理时使用期望步数
### 其他变体
- **Early-Exit Networks**:中间层添加分类器,满足条件则提前退出
- **AdaTape**:动态扩展输入序列
- **Sparse Universal Transformer**:循环权重共享 + 动态 halting + MoE
## CTM 的原生 ACT
CTM 通过 [[certainty-based-loss|Certainty-Based Loss]] 自然实现 ACT无需显式 halting 模块:
- 确定性可以作为停止条件
- 简单样本在早期 tick 即达到高确定性
- ImageNet 实验中,大多数样本在 <10 ticks 即可停止总共 50 ticks
## 关键区别
CTM ACT **涌现属性**而非显式设计——没有 halting 模块没有 ponder cost没有步数采样这是其架构哲学的核心体现通过设计损失函数和表示"智能"行为自然涌现
## 来源
- Graves, "Adaptive Computation Time for Recurrent Neural Networks", 2016
- [[darlow-ctm-2025|CTM 论文]] (NeurIPS 2025)