Files
myWiki/concepts/me2-principle.md

34 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "ME² Principle"
created: 2026-06-24
updated: 2026-06-24
type: concept
tags: ["reasoning", "evaluation", "quality-metrics"]
sources:
- "[[me2-trm-reasoning-2026]]"
---
# ME² Principle
ME² 是 Zhang et al. (ICML 2026) 提出的推理质量表征原则,沿两个正交轴定义高质量推理。
## 四象限
| | Macro全局结构 | Micro局部步骤 |
|---|---|---|
| **Effectiveness** | 结构合理、无冗余分支 | 步骤正确、逻辑连贯 |
| **Efficiency** | 路径简洁、无绕路 | 步骤精简、无赘述 |
## PRM 的问题
Process Reward Models 通常只覆盖 Micro-Effectiveness步骤级正确性标注忽略了宏观结构组织和效率维度。ME² 提供了统一的评估视角——推理质量需要同时考虑这四个维度。
## 与答案正确性的解耦
ME² 仅评估**推理轨迹质量**与最终答案是否正确无关。TRM 训练于 verified-correct 推理对的偏好数据,证明推理质量可独立于答案正确性评估。
## 参考
- [[me2-trm-reasoning-2026]]
- [[thinking-reward-model]]
- [[dag-reasoning-evaluation]]