Files
myWiki/concepts/unlimited-ocr.md

38 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Unlimited OCR 模型"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["ocr", "attention-mechanism", "long-horizon", "end-to-end", "baidu"]
sources:
- "[[unlimited-ocr-works-2026]]"
---
# Unlimited OCR
Unlimited OCR 是百度提出的端到端长程 OCR 模型。以 DeepSeek OCR 为基线,将所有 decoder 注意力层替换为 R-SWA实现恒定 KV cache + 恒定推理速度。
## 架构
- 继承 DeepEncoder16× 压缩,冻结训练)
- Decoder3B MoE激活 500M全部注意力替换为 R-SWA
- 训练4000 步8×16 A80032K 序列长度DeepEP EP=4
## 核心性能
- OmniDocBench v1.593.23%+6.22pp over DeepSeek OCR
- 2-40+ 页长程解析:一次前向
- 推理 TPS 恒定6000 token 时领先 35%
## 认知启发
人类长程抄写时只关注附近上下文不回溯全部历史。R-SWA 的 soft forgetting 与此一致。
## 参考
- [[unlimited-ocr-works-2026]]
- [[reference-sliding-window-attention]]
- [[deepseek-ocr]]
- [[deepencoder]]
- [[constant-kv-cache]]
- [[long-horizon-parsing]]