Files
myWiki/concepts/long-horizon-parsing.md

42 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Long-Horizon Parsing"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["ocr", "long-horizon", "parsing", "document-understanding"]
sources:
- "[[unlimited-ocr-works-2026]]"
---
# Long-Horizon Parsing
Long-Horizon Parsing 指在单次前向传播中解析多页/长文档的 OCR 能力,区别于传统的逐页 for-loop 处理模式。
## 问题
现有 OCR 模型采用 page-by-page for-loop 处理,每页重置记忆,将连贯的长程过程碎片化为孤立短任务。人类则在长程抄写中维持连续的认知状态,效率不降。
## Unlimited OCR 的方案
通过 R-SWA + DeepEncoder 高压缩率16×
- 10K 视觉 token ≈ 20-30 页1024×1024
- 10K 视觉 token → ~100K 文本 token 解码
- 恒定 KV cache + 恒定 TPS 支撑全量解码
## 评估
在 2/5/10/20/40+ 页的书籍、文档、论文测试集上:
- Distinct-n > 96%(内容多样性保持)
- Edit Distance < 0.11高精度
## 与通用 Long-Horizon 的区别
此概念特指**解析/转录类任务中的长程能力**OCR/ASR/翻译不同于强化学习中的 long-horizon planning utility modeling
## 参考
- [[unlimited-ocr-works-2026]]
- [[reference-sliding-window-attention]]
- [[deepencoder]]
- [[long-horizon-utility]]
- [[long-horizon-evaluation]]