Files
myWiki/concepts/end-to-end-ocr.md

39 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "End-to-End OCR"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["ocr", "end-to-end", "vlm", "document-parsing"]
sources:
- "[[unlimited-ocr-works-2026]]"
---
# End-to-End OCR
End-to-End OCR 是一种将文本检测和识别合并为单一统一模型的 OCR 范式,利用 VLM/LLM 的强大解码能力,在单次前向传播中解析整页内容。
## 与 Pipeline 范式的对比
| 维度 | Pipeline OCR | End-to-End OCR |
|------|-------------|----------------|
| 架构 | 检测模型 + 多识别模型 + 启发式策略 | 单一统一模型 |
| 解码次数 | 多次(检测→裁剪→识别) | 单次 |
| 模型要求 | 低 | 高(需更大模型容量) |
| 训练难度 | 低 | 高 |
| 对 VLM 发展的启发 | 有限 | 可直接推动通用 VLM 进步 |
## 核心模块
1. **High-compression Encoder**(如 [[deepencoder]]):提取并压缩图像信息,决定解码效率的上限
2. **High-efficiency Decoder**(如 R-SWA直接影响推理成本和生成长度上限
## 当前 SOTA
Unlimited OCRv1.5: 93.23%, v1.6: 93.54%、DeepSeek OCR 2、Qianfan-OCR、Logics-Parsing-v2 等。
## 参考
- [[unlimited-ocr-works-2026]]
- [[deepseek-ocr]]
- [[deepencoder]]
- [[omnidocbench]]