Files
myWiki/concepts/round-trip-reconstruction-score.md

42 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Round-Trip Reconstruction Score (RS@k)"
created: 2026-05-14
type: concept
tags: ["evaluation-metric", "semantic-equivalence", "reconstruction", "delegate-52"]
sources: ["https://arxiv.org/abs/2604.15597"]
---
# Round-Trip Reconstruction Score (RS@k)
RS@k 是 [[delegate-52]] 中的核心评估指标,衡量经过 k 次委托交互后文档相对于原始状态的重建质量。
## 定义
在 [[backtranslation-round-trip-relay|回译接力]]中k 次交互 = k/2 个回译。RS@k 定义为:
RS@k(s) = sim(s, ŝ_{k/2})
其中 sim 是领域特定的 [[semantic-equivalence|语义等价]]函数 ∈ [0, 1]。
## 含义
- **RS@2**1 次回译后的表现(短交互)
- **RS@20**10 次回译后的表现(主要实验中)
- **RS@100**50 次回译后的表现(扩展实验中)
## Ready 阈值
RS@20 ≥ 98% 视为该模型在该领域对 [[delegated-work|委托工作]]"准备就绪"。
## 跨交互退化轨迹
以 GPT 5.4 为例RS@2 = 94.3 → RS@10 = 79.4 → RS@20 = 71.5
退化为非线性单调下降,无平台迹象。
## 相关概念
- [[delegate-52]] — 使用此指标的基准
- [[backtranslation-round-trip-relay]] — 产生此指标的方法
- [[semantic-equivalence]] — sim 函数的实现
- [[document-degradation]] — RS@k 下降揭示的现象