Files
myWiki/concepts/onereason-bench.md

44 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "OneReason-Bench"
created: 2026-06-10
updated: 2026-06-10
type: concept
tags: [benchmark, recommendation, reasoning, evaluation]
sources: [raw/papers/onereason-team-onereason-2026.md]
---
# OneReason-Bench
> OneReason 提出的推荐推理评测基准,按 R0-R3 四层递进评估生成式推荐模型的推理能力。
## 设计动机
RecIF-Bench (OpenOneRec) 虽然拓宽了推荐基础模型的评估范围但其推理评估仍是粗粒度的、不够诊断性。OneReason-Bench 将其扩展为多层次推理评测。
## 评测层次
对应 [[perception-cognition-recommendation|R0-R3]] 四层:
| 层次 | 任务类型 | 核心指标 |
|------|---------|---------|
| R0: Perception | Item Understanding, Pattern Grounding, Item QA | LLM-as-a-Judge, Pass@K, Accuracy |
| R1: Derivation | Item2Item 关联 | Accuracy |
| R2: Evolution | 演化行为选择/主题生成/链生成 | F1, Action-Logic Score |
| R3: Recommendation | 单域/跨域推荐 | Pass@K, Recall@K |
## 统一任务格式
所有任务形式化为序列生成 Y = F(X)
- X任务指令 + 上下文itemic pattern、用户画像、交互历史
- Yitemic pattern、答案选项、自然语言响应或结构化演化链
## 角色定位
OneReason-Bench 不仅是排行榜,更是测量协议——在每个开发阶段为设计决策提供依据、监控和验证。
## 参考
- [[onereason|OneReason]]
- [[perception-cognition-recommendation|感知-认知推荐层次]]
- [[recommendation-reasoning|推荐推理]]