20260625:很多新内容
This commit is contained in:
65
concepts/agentic-streaming-inference.md
Normal file
65
concepts/agentic-streaming-inference.md
Normal file
@@ -0,0 +1,65 @@
|
||||
---
|
||||
title: "Agentic Streaming Inference"
|
||||
created: 2026-06-20
|
||||
updated: 2026-06-20
|
||||
type: concept
|
||||
tags: ["inference", "streaming", "agent", "framework", "real-time"]
|
||||
sources: ["https://arxiv.org/abs/2606.17800"]
|
||||
---
|
||||
|
||||
# Agentic Streaming Inference (Agentic 流式推理)
|
||||
|
||||
**Agentic Streaming Inference** 是 [[maineCoon|MaineCoon]] 提出的**训练无关推理框架**:用三个 agentic 控制器包裹冻结的生成器,不修改模型权重即可实现千秒级稳定流式生成。
|
||||
|
||||
## 架构
|
||||
|
||||
```
|
||||
Viewer ← Stream ← [Buffer Controller] → [Frozen Generator + KV-Cache]
|
||||
↑ Timing ↑ Memory ↑ Content
|
||||
[Cache Manager] ←→ [Director: Planner + Observer]
|
||||
```
|
||||
|
||||
三个控制器各司其职,**内容/记忆/时间三者分离**:
|
||||
|
||||
| 控制器 | 职责 | 核心机制 |
|
||||
|--------|------|---------|
|
||||
| **Director** (Planner + Observer) | 内容流 | Gemma 4 26B agent 写 prompt + 观察质量 |
|
||||
| **[[agentic-cache-manager|Cache Manager]]** | 记忆 | bounded keep-set + drift control |
|
||||
| **[[look-ahead-buffer-controller|Buffer Controller]]** | 时间/节奏 | pace gate 管理生成 lead |
|
||||
|
||||
## 关键设计原则
|
||||
|
||||
### 1. 分离关注点
|
||||
- **Agent (Planner/Observer)** 负责认知:何时生成什么、是否退化、如何修复
|
||||
- **Engine (Generator)** 负责执行:以固定节奏持续生成,不被中断
|
||||
- **Manager (Cache/Buffer)** 负责治理:记住什么、何时输出
|
||||
|
||||
### 2. 永不中断流
|
||||
- Generator 以固定 cadence 运行,永不 start/stop/step
|
||||
- 所有修正通过 prompt stream 前向注入,不重置流
|
||||
- Observer 在 generation head 上检查(领先 playback),修复在观众看到之前完成
|
||||
|
||||
### 3. 优雅降级
|
||||
- 分割/检查/规划失败 → 降级到更粗粒度的信号或安全续写
|
||||
- Observer 端任何失败**不会卡住流**
|
||||
|
||||
## Director: Planner + Observer
|
||||
|
||||
**Planner** 按固定 beat 产生结构化 prompt:
|
||||
```
|
||||
[VISUAL] 角色外观 + [SPEECH] 台词 + [SOUNDS] 环境音 + tags
|
||||
```
|
||||
维护有限规划历史和已说台词记录,确保不重复。
|
||||
|
||||
**Observer** 在生成前线观察质量:
|
||||
- 五项 photometric 漂移指标(廉价,每帧运行)
|
||||
- 周期性 VLM 检查语义缺陷
|
||||
- 通过 [[forward-repair-ladder|前向修复阶梯]] 修复
|
||||
|
||||
**Feeder & Fast Lane**:异步队列化 prompt,fast lane 替换尚未生成的 beat,不影响正在飞行的 chunk。
|
||||
|
||||
## 参考
|
||||
- [[maineCoon|MaineCoon 论文]] Section 4
|
||||
- [[agentic-cache-manager|Agentic Cache Manager]]
|
||||
- [[look-ahead-buffer-controller|Look-Ahead Buffer Controller]]
|
||||
- [[forward-repair-ladder|Forward-Repair Ladder]]
|
||||
Reference in New Issue
Block a user