20260625:很多新内容
This commit is contained in:
41
concepts/long-horizon-parsing.md
Normal file
41
concepts/long-horizon-parsing.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "Long-Horizon Parsing"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["ocr", "long-horizon", "parsing", "document-understanding"]
|
||||
sources:
|
||||
- "[[unlimited-ocr-works-2026]]"
|
||||
---
|
||||
|
||||
# Long-Horizon Parsing
|
||||
|
||||
Long-Horizon Parsing 指在单次前向传播中解析多页/长文档的 OCR 能力,区别于传统的逐页 for-loop 处理模式。
|
||||
|
||||
## 问题
|
||||
|
||||
现有 OCR 模型采用 page-by-page for-loop 处理,每页重置记忆,将连贯的长程过程碎片化为孤立短任务。人类则在长程抄写中维持连续的认知状态,效率不降。
|
||||
|
||||
## Unlimited OCR 的方案
|
||||
|
||||
通过 R-SWA + DeepEncoder 高压缩率(16×):
|
||||
- 10K 视觉 token ≈ 20-30 页(1024×1024)
|
||||
- 10K 视觉 token → ~100K 文本 token 解码
|
||||
- 恒定 KV cache + 恒定 TPS 支撑全量解码
|
||||
|
||||
## 评估
|
||||
|
||||
在 2/5/10/20/40+ 页的书籍、文档、论文测试集上:
|
||||
- Distinct-n > 96%(内容多样性保持)
|
||||
- Edit Distance < 0.11(高精度)
|
||||
|
||||
## 与通用 Long-Horizon 的区别
|
||||
|
||||
此概念特指**解析/转录类任务中的长程能力**(OCR/ASR/翻译),不同于强化学习中的 long-horizon planning 或 utility modeling。
|
||||
|
||||
## 参考
|
||||
- [[unlimited-ocr-works-2026]]
|
||||
- [[reference-sliding-window-attention]]
|
||||
- [[deepencoder]]
|
||||
- [[long-horizon-utility]]
|
||||
- [[long-horizon-evaluation]]
|
||||
Reference in New Issue
Block a user