20260514:增加新内容
This commit is contained in:
41
concepts/round-trip-reconstruction-score.md
Normal file
41
concepts/round-trip-reconstruction-score.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "Round-Trip Reconstruction Score (RS@k)"
|
||||
created: 2026-05-14
|
||||
type: concept
|
||||
tags: ["evaluation-metric", "semantic-equivalence", "reconstruction", "delegate-52"]
|
||||
sources: ["https://arxiv.org/abs/2604.15597"]
|
||||
---
|
||||
|
||||
# Round-Trip Reconstruction Score (RS@k)
|
||||
|
||||
RS@k 是 [[delegate-52]] 中的核心评估指标,衡量经过 k 次委托交互后文档相对于原始状态的重建质量。
|
||||
|
||||
## 定义
|
||||
|
||||
在 [[backtranslation-round-trip-relay|回译接力]]中,k 次交互 = k/2 个回译。RS@k 定义为:
|
||||
|
||||
RS@k(s) = sim(s, ŝ_{k/2})
|
||||
|
||||
其中 sim 是领域特定的 [[semantic-equivalence|语义等价]]函数 ∈ [0, 1]。
|
||||
|
||||
## 含义
|
||||
|
||||
- **RS@2**:1 次回译后的表现(短交互)
|
||||
- **RS@20**:10 次回译后的表现(主要实验中)
|
||||
- **RS@100**:50 次回译后的表现(扩展实验中)
|
||||
|
||||
## Ready 阈值
|
||||
|
||||
RS@20 ≥ 98% 视为该模型在该领域对 [[delegated-work|委托工作]]"准备就绪"。
|
||||
|
||||
## 跨交互退化轨迹
|
||||
|
||||
以 GPT 5.4 为例:RS@2 = 94.3 → RS@10 = 79.4 → RS@20 = 71.5
|
||||
退化为非线性单调下降,无平台迹象。
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[delegate-52]] — 使用此指标的基准
|
||||
- [[backtranslation-round-trip-relay]] — 产生此指标的方法
|
||||
- [[semantic-equivalence]] — sim 函数的实现
|
||||
- [[document-degradation]] — RS@k 下降揭示的现象
|
||||
Reference in New Issue
Block a user