20260514:增加新内容
This commit is contained in:
49
concepts/bidirectional-trajectory-evaluation.md
Normal file
49
concepts/bidirectional-trajectory-evaluation.md
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: "双向轨迹评估 (Bidirectional Trajectory Evaluation)"
|
||||
domain: "Reinforcement Learning / Reward Design"
|
||||
tags: [trajectory, evaluation, path-tracing, reward]
|
||||
sources: [[thinking-with-visual-primitives]]
|
||||
---
|
||||
|
||||
# 双向轨迹评估 (Bidirectional Trajectory Evaluation)
|
||||
|
||||
> 路径追踪任务的核心评估方法:同时从预测→真实(前向)和真实→预测(反向)两个方向计算轨迹对齐度。
|
||||
|
||||
## 为什么需要双向
|
||||
|
||||
单向评估的缺陷:
|
||||
- **仅前向**:模型只输出起点附近的几个安全点 → 高分但路径不完整
|
||||
- **仅反向**:不惩罚模型幻想出的偏离路径(detour)
|
||||
|
||||
**双向结合** → 激励模型输出**完整且准确**的坐标轨迹。
|
||||
|
||||
## 双向计算
|
||||
|
||||
### 前向(Forward)
|
||||
对每个**预测点**,计算到**真实曲线**任意线段的最小距离 → 取平均
|
||||
$$\text{Forward} = \frac{1}{N_{\text{pred}}} \sum_{p \in \text{pred}} \min_{s \in \text{GT}} \text{dist}(p, s)$$
|
||||
|
||||
→ 惩罚偏离真实路径的点
|
||||
|
||||
### 反向(Reverse)
|
||||
对每个**真实点**,计算到**预测折线**任意线段的最小距离 → 取平均
|
||||
$$\text{Reverse} = \frac{1}{N_{\text{GT}}} \sum_{g \in \text{GT}} \min_{s \in \text{pred}} \text{dist}(g, s)$$
|
||||
|
||||
→ 惩罚遗漏的曲线段
|
||||
|
||||
### 最终得分
|
||||
$$\text{Trajectory Score} = \frac{\text{Forward} + \text{Reverse}}{2}$$
|
||||
|
||||
## 完整奖励组成
|
||||
|
||||
路径追踪 Accuracy RM = 加权组合:
|
||||
1. 双向轨迹精度
|
||||
2. 端点精度(起终点坐标匹配)
|
||||
3. 轨迹连续性惩罚(最后预测点→预测终点的距离超过阈值则处罚)
|
||||
4. 答案正确性
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[path-tracing|路径追踪]] — 应用任务
|
||||
- [[exponential-decay-reward|指数衰减奖励]] — 计数任务对应方案
|
||||
- [[reward-model|奖励模型]] — 整体 RM 设计
|
||||
Reference in New Issue
Block a user