20260617:目前有914 页
This commit is contained in:
27
concepts/supervised-fine-tuning.md
Normal file
27
concepts/supervised-fine-tuning.md
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
title: "监督微调 (Supervised Fine-Tuning, SFT)"
|
||||
created: 2026-06-03
|
||||
updated: 2026-06-03
|
||||
type: concept
|
||||
tags: [fine-tuning, LLM, training]
|
||||
status: placeholder
|
||||
---
|
||||
|
||||
# 监督微调 (Supervised Fine-Tuning, SFT)
|
||||
|
||||
> ⚠️ 占位符页面 — 待完善
|
||||
|
||||
监督微调(SFT)是在预训练 LLM 上使用标注数据(输入-输出对)进行进一步训练的标准范式。广泛应用于指令微调、领域适配等场景。
|
||||
|
||||
**关键争议**:SFT 对小型 DNN 广泛有效,但在 LLM 上效果不一致——有时提升指令遵循能力,有时导致过拟合和泛化能力下降。
|
||||
|
||||
**核心论文**:
|
||||
- [[zhang-reconciling-sft-interaction-2026|Zhang et al. (2026)]] — 从交互视角解释 SFT 效果不一致的原因
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[sft-denoising-stage|SFT 去噪阶段]]
|
||||
- [[sft-early-stopping|SFT 早停策略]]
|
||||
- [[lora]]
|
||||
- [[rlhf]]
|
||||
- [[dpo]]
|
||||
Reference in New Issue
Block a user