20260514:增加新内容

2026-05-14 13:54:52 +08:00
parent 56c4d3ef7c
commit b116710e4c
294 changed files with 10682 additions and 255 deletions
--- a/concepts/bidirectional-trajectory-evaluation.md
+++ b/concepts/bidirectional-trajectory-evaluation.md
@@ -0,0 +1,49 @@
+---
+title: "双向轨迹评估 (Bidirectional Trajectory Evaluation)"
+domain: "Reinforcement Learning / Reward Design"
+tags: [trajectory, evaluation, path-tracing, reward]
+sources: [[thinking-with-visual-primitives]]
+---
+
+# 双向轨迹评估 (Bidirectional Trajectory Evaluation)
+
+> 路径追踪任务的核心评估方法：同时从预测→真实（前向）和真实→预测（反向）两个方向计算轨迹对齐度。
+
+## 为什么需要双向
+
+单向评估的缺陷：
+- **仅前向**：模型只输出起点附近的几个安全点 → 高分但路径不完整
+- **仅反向**：不惩罚模型幻想出的偏离路径（detour）
+
+**双向结合** → 激励模型输出**完整且准确**的坐标轨迹。
+
+## 双向计算
+
+### 前向（Forward）
+对每个**预测点**，计算到**真实曲线**任意线段的最小距离 → 取平均
+$$\text{Forward} = \frac{1}{N_{\text{pred}}} \sum_{p \in \text{pred}} \min_{s \in \text{GT}} \text{dist}(p, s)$$
+
+→ 惩罚偏离真实路径的点
+
+### 反向（Reverse）
+对每个**真实点**，计算到**预测折线**任意线段的最小距离 → 取平均
+$$\text{Reverse} = \frac{1}{N_{\text{GT}}} \sum_{g \in \text{GT}} \min_{s \in \text{pred}} \text{dist}(g, s)$$
+
+→ 惩罚遗漏的曲线段
+
+### 最终得分
+$$\text{Trajectory Score} = \frac{\text{Forward} + \text{Reverse}}{2}$$
+
+## 完整奖励组成
+
+路径追踪 Accuracy RM = 加权组合：
+1. 双向轨迹精度
+2. 端点精度（起终点坐标匹配）
+3. 轨迹连续性惩罚（最后预测点→预测终点的距离超过阈值则处罚）
+4. 答案正确性
+
+## 相关概念
+
+- [[path-tracing|路径追踪]] — 应用任务
+- [[exponential-decay-reward|指数衰减奖励]] — 计数任务对应方案
+- [[reward-model|奖励模型]] — 整体 RM 设计