20260625:很多新内容
This commit is contained in:
38
concepts/end-to-end-ocr.md
Normal file
38
concepts/end-to-end-ocr.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: "End-to-End OCR"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["ocr", "end-to-end", "vlm", "document-parsing"]
|
||||
sources:
|
||||
- "[[unlimited-ocr-works-2026]]"
|
||||
---
|
||||
|
||||
# End-to-End OCR
|
||||
|
||||
End-to-End OCR 是一种将文本检测和识别合并为单一统一模型的 OCR 范式,利用 VLM/LLM 的强大解码能力,在单次前向传播中解析整页内容。
|
||||
|
||||
## 与 Pipeline 范式的对比
|
||||
|
||||
| 维度 | Pipeline OCR | End-to-End OCR |
|
||||
|------|-------------|----------------|
|
||||
| 架构 | 检测模型 + 多识别模型 + 启发式策略 | 单一统一模型 |
|
||||
| 解码次数 | 多次(检测→裁剪→识别) | 单次 |
|
||||
| 模型要求 | 低 | 高(需更大模型容量) |
|
||||
| 训练难度 | 低 | 高 |
|
||||
| 对 VLM 发展的启发 | 有限 | 可直接推动通用 VLM 进步 |
|
||||
|
||||
## 核心模块
|
||||
|
||||
1. **High-compression Encoder**(如 [[deepencoder]]):提取并压缩图像信息,决定解码效率的上限
|
||||
2. **High-efficiency Decoder**(如 R-SWA):直接影响推理成本和生成长度上限
|
||||
|
||||
## 当前 SOTA
|
||||
|
||||
Unlimited OCR(v1.5: 93.23%, v1.6: 93.54%)、DeepSeek OCR 2、Qianfan-OCR、Logics-Parsing-v2 等。
|
||||
|
||||
## 参考
|
||||
- [[unlimited-ocr-works-2026]]
|
||||
- [[deepseek-ocr]]
|
||||
- [[deepencoder]]
|
||||
- [[omnidocbench]]
|
||||
Reference in New Issue
Block a user