--- title: "RLHF (Reinforcement Learning from Human Feedback)" created: 2026-06-03 updated: 2026-06-03 type: concept tags: [RLHF, alignment, LLM, training] status: placeholder --- # RLHF (Reinforcement Learning from Human Feedback) > ⚠️ 占位符页面 — 待完善 RLHF 是一种基于人类反馈的强化学习对齐方法,是 SFT 的主要替代/补充后训练范式。典型流程:SFT → 奖励模型训练 → PPO 优化。 与 SFT 的对比是 [[zhang-reconciling-sft-interaction-2026|Zhang et al. (2026)]] 讨论的重要背景。 ## 相关概念 - [[supervised-fine-tuning|SFT]] - [[dpo]]