20260625:很多新内容
This commit is contained in:
33
concepts/me2-principle.md
Normal file
33
concepts/me2-principle.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "ME² Principle"
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: concept
|
||||
tags: ["reasoning", "evaluation", "quality-metrics"]
|
||||
sources:
|
||||
- "[[me2-trm-reasoning-2026]]"
|
||||
---
|
||||
|
||||
# ME² Principle
|
||||
|
||||
ME² 是 Zhang et al. (ICML 2026) 提出的推理质量表征原则,沿两个正交轴定义高质量推理。
|
||||
|
||||
## 四象限
|
||||
|
||||
| | Macro(全局结构) | Micro(局部步骤) |
|
||||
|---|---|---|
|
||||
| **Effectiveness** | 结构合理、无冗余分支 | 步骤正确、逻辑连贯 |
|
||||
| **Efficiency** | 路径简洁、无绕路 | 步骤精简、无赘述 |
|
||||
|
||||
## PRM 的问题
|
||||
|
||||
Process Reward Models 通常只覆盖 Micro-Effectiveness(步骤级正确性标注),忽略了宏观结构组织和效率维度。ME² 提供了统一的评估视角——推理质量需要同时考虑这四个维度。
|
||||
|
||||
## 与答案正确性的解耦
|
||||
|
||||
ME² 仅评估**推理轨迹质量**,与最终答案是否正确无关。TRM 训练于 verified-correct 推理对的偏好数据,证明推理质量可独立于答案正确性评估。
|
||||
|
||||
## 参考
|
||||
- [[me2-trm-reasoning-2026]]
|
||||
- [[thinking-reward-model]]
|
||||
- [[dag-reasoning-evaluation]]
|
||||
Reference in New Issue
Block a user