Files
myWiki/concepts/synthetic-data.md

25 lines
771 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "合成数据 (Synthetic Data)"
created: 2026-06-03
updated: 2026-06-03
type: concept
tags: [synthetic-data, training, data-generation]
status: placeholder
---
# 合成数据 (Synthetic Data)
> ⚠️ 占位符页面 — 待完善
合成数据是通过算法或模型生成的人工数据,用于增强或替代真实训练数据。在 LLM 训练中广泛用于:
- **问题生成**:如 [[mathchatsync-reasoning|MathChatSync]] 的多轮推理数据合成
- **指令数据**GPT-4 等强模型生成指令-响应对
- **数据扩充**:弥补真实数据不足的领域
## 相关概念
- [[mathchatsync-reasoning|MathChatSync 推理]]
- [[synthetic-data-qa-generation|合成数据 QA 生成]]
- [[data-quality-over-scale|数据质量重于规模]]