Files
myWiki/concepts/supervised-fine-tuning.md

28 lines
877 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "监督微调 (Supervised Fine-Tuning, SFT)"
created: 2026-06-03
updated: 2026-06-03
type: concept
tags: [fine-tuning, LLM, training]
status: placeholder
---
# 监督微调 (Supervised Fine-Tuning, SFT)
> ⚠️ 占位符页面 — 待完善
监督微调SFT是在预训练 LLM 上使用标注数据(输入-输出对)进行进一步训练的标准范式。广泛应用于指令微调、领域适配等场景。
**关键争议**SFT 对小型 DNN 广泛有效,但在 LLM 上效果不一致——有时提升指令遵循能力,有时导致过拟合和泛化能力下降。
**核心论文**
- [[zhang-reconciling-sft-interaction-2026|Zhang et al. (2026)]] — 从交互视角解释 SFT 效果不一致的原因
## 相关概念
- [[sft-denoising-stage|SFT 去噪阶段]]
- [[sft-early-stopping|SFT 早停策略]]
- [[lora]]
- [[rlhf]]
- [[dpo]]