20260617:目前有914 页
This commit is contained in:
48
concepts/mathchatsync-reasoning.md
Normal file
48
concepts/mathchatsync-reasoning.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "MathChatSync Reasoning"
|
||||
created: 2025-06-02
|
||||
updated: 2025-06-02
|
||||
type: concept
|
||||
tags: [dataset, multi-turn-reasoning, math]
|
||||
sources: ["[[goru-one-pass-to-reason-2025]]"]
|
||||
---
|
||||
|
||||
# MathChatSync Reasoning
|
||||
|
||||
> [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 论文中创建并发布的首个公开多轮推理数据集,基于 MathChatSync 用 GPT-4.1-mini 合成推理 token。
|
||||
|
||||
## 背景
|
||||
|
||||
现有的推理模型([[deepseek-r1]] 等)主要在单轮推理数据上训练。缺乏公开的多轮推理数据集是 [[multi-turn-reasoning|多轮推理训练]] 研究的瓶颈。
|
||||
|
||||
## 构建方法
|
||||
|
||||
1. **源数据**:MathChatSync(Liang et al., 2024)—— 多轮数学对话数据集
|
||||
2. **推理合成**:用 GPT-4.1-mini 为每个助手回复生成推理 token
|
||||
3. **条件**:推理生成基于对话历史和当前助手回复内容
|
||||
|
||||
## 特点
|
||||
|
||||
- **多轮结构**:每个对话包含 N 轮交替的人类消息和助手回复
|
||||
- **显式推理**:每个助手回复 ai = (ti, ri),包括 reasoning token 和 response token
|
||||
- **对话深度**:1–16 轮(受 MathChatSync 分布影响,偏 5–7 轮)
|
||||
|
||||
## 实验分组
|
||||
|
||||
论文按对话深度将数据分为三组:
|
||||
- **G1**:1–5 轮
|
||||
- **G2**:6–7 轮
|
||||
- **G3**:8–16 轮
|
||||
|
||||
实验中验证了深度越大,[[one-pass-fine-tuning|1-Pass]] 加速越明显(符合 O(N²) vs O(N³) 的理论预测)。
|
||||
|
||||
## 获取
|
||||
|
||||
- HuggingFace: `devrev-research/MathChatSync-reasoning`
|
||||
- 论文代码: `github.com/devrev/One-Pass-to-Reason`
|
||||
|
||||
## 相关
|
||||
|
||||
- [[multi-turn-reasoning]]
|
||||
- [[goru-one-pass-to-reason-2025|One-Pass to Reason 论文]]
|
||||
- [[synthetic-data|合成数据]]
|
||||
Reference in New Issue
Block a user