Files
myWiki/concepts/bidirectional-trajectory-evaluation.md

49 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "双向轨迹评估 (Bidirectional Trajectory Evaluation)"
domain: "Reinforcement Learning / Reward Design"
tags: [trajectory, evaluation, path-tracing, reward]
sources: [[thinking-with-visual-primitives]]
---
# 双向轨迹评估 (Bidirectional Trajectory Evaluation)
> 路径追踪任务的核心评估方法:同时从预测→真实(前向)和真实→预测(反向)两个方向计算轨迹对齐度。
## 为什么需要双向
单向评估的缺陷:
- **仅前向**:模型只输出起点附近的几个安全点 → 高分但路径不完整
- **仅反向**不惩罚模型幻想出的偏离路径detour
**双向结合** → 激励模型输出**完整且准确**的坐标轨迹。
## 双向计算
### 前向Forward
对每个**预测点**,计算到**真实曲线**任意线段的最小距离 → 取平均
$$\text{Forward} = \frac{1}{N_{\text{pred}}} \sum_{p \in \text{pred}} \min_{s \in \text{GT}} \text{dist}(p, s)$$
→ 惩罚偏离真实路径的点
### 反向Reverse
对每个**真实点**,计算到**预测折线**任意线段的最小距离 → 取平均
$$\text{Reverse} = \frac{1}{N_{\text{GT}}} \sum_{g \in \text{GT}} \min_{s \in \text{pred}} \text{dist}(g, s)$$
→ 惩罚遗漏的曲线段
### 最终得分
$$\text{Trajectory Score} = \frac{\text{Forward} + \text{Reverse}}{2}$$
## 完整奖励组成
路径追踪 Accuracy RM = 加权组合:
1. 双向轨迹精度
2. 端点精度(起终点坐标匹配)
3. 轨迹连续性惩罚(最后预测点→预测终点的距离超过阈值则处罚)
4. 答案正确性
## 相关概念
- [[path-tracing|路径追踪]] — 应用任务
- [[exponential-decay-reward|指数衰减奖励]] — 计数任务对应方案
- [[reward-model|奖励模型]] — 整体 RM 设计