Files
myWiki/concepts/mathchatsync-reasoning.md

49 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "MathChatSync Reasoning"
created: 2025-06-02
updated: 2025-06-02
type: concept
tags: [dataset, multi-turn-reasoning, math]
sources: ["[[goru-one-pass-to-reason-2025]]"]
---
# MathChatSync Reasoning
> [[goru-one-pass-to-reason-2025|One-Pass to Reason]] 论文中创建并发布的首个公开多轮推理数据集,基于 MathChatSync 用 GPT-4.1-mini 合成推理 token。
## 背景
现有的推理模型([[deepseek-r1]] 等)主要在单轮推理数据上训练。缺乏公开的多轮推理数据集是 [[multi-turn-reasoning|多轮推理训练]] 研究的瓶颈。
## 构建方法
1. **源数据**MathChatSyncLiang et al., 2024—— 多轮数学对话数据集
2. **推理合成**:用 GPT-4.1-mini 为每个助手回复生成推理 token
3. **条件**:推理生成基于对话历史和当前助手回复内容
## 特点
- **多轮结构**:每个对话包含 N 轮交替的人类消息和助手回复
- **显式推理**:每个助手回复 ai = (ti, ri),包括 reasoning token 和 response token
- **对话深度**116 轮(受 MathChatSync 分布影响,偏 57 轮)
## 实验分组
论文按对话深度将数据分为三组:
- **G1**15 轮
- **G2**67 轮
- **G3**816 轮
实验中验证了深度越大,[[one-pass-fine-tuning|1-Pass]] 加速越明显(符合 O(N²) vs O(N³) 的理论预测)。
## 获取
- HuggingFace: `devrev-research/MathChatSync-reasoning`
- 论文代码: `github.com/devrev/One-Pass-to-Reason`
## 相关
- [[multi-turn-reasoning]]
- [[goru-one-pass-to-reason-2025|One-Pass to Reason 论文]]
- [[synthetic-data|合成数据]]