38 lines
1.1 KiB
Markdown
38 lines
1.1 KiB
Markdown
---
|
||
title: "Unlimited OCR 模型"
|
||
created: 2026-06-24
|
||
updated: 2026-06-24
|
||
type: concept
|
||
tags: ["ocr", "attention-mechanism", "long-horizon", "end-to-end", "baidu"]
|
||
sources:
|
||
- "[[unlimited-ocr-works-2026]]"
|
||
---
|
||
|
||
# Unlimited OCR
|
||
|
||
Unlimited OCR 是百度提出的端到端长程 OCR 模型。以 DeepSeek OCR 为基线,将所有 decoder 注意力层替换为 R-SWA,实现恒定 KV cache + 恒定推理速度。
|
||
|
||
## 架构
|
||
|
||
- 继承 DeepEncoder(16× 压缩,冻结训练)
|
||
- Decoder:3B MoE,激活 500M,全部注意力替换为 R-SWA
|
||
- 训练:4000 步,8×16 A800,32K 序列长度,DeepEP EP=4
|
||
|
||
## 核心性能
|
||
|
||
- OmniDocBench v1.5:93.23%(+6.22pp over DeepSeek OCR)
|
||
- 2-40+ 页长程解析:一次前向
|
||
- 推理 TPS 恒定,6000 token 时领先 35%
|
||
|
||
## 认知启发
|
||
|
||
人类长程抄写时只关注附近上下文,不回溯全部历史。R-SWA 的 soft forgetting 与此一致。
|
||
|
||
## 参考
|
||
- [[unlimited-ocr-works-2026]]
|
||
- [[reference-sliding-window-attention]]
|
||
- [[deepseek-ocr]]
|
||
- [[deepencoder]]
|
||
- [[constant-kv-cache]]
|
||
- [[long-horizon-parsing]]
|