Files
myWiki/concepts/rlhf.md

28 lines
1012 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "RLHF (Reinforcement Learning from Human Feedback)"
created: 2026-06-03
updated: 2026-06-03
type: concept
tags: [RLHF, alignment, LLM, training]
status: placeholder
---
# RLHF (Reinforcement Learning from Human Feedback)
> ⚠️ 占位符页面 — 待完善
RLHF 是一种基于人类反馈的强化学习对齐方法,是 SFT 的主要替代/补充后训练范式。典型流程SFT → 奖励模型训练 → PPO 优化。
与 SFT 的对比是 [[zhang-reconciling-sft-interaction-2026|Zhang et al. (2026)]] 讨论的重要背景。
## 沉默螺旋维度
RLHF 对齐训练为规避风险而压低 token 预测熵值,会压缩模型的创作空间——这是 [[rlhf-alignment-amplification|RLHF 对齐放大]] 效应的核心,已被证实是 [[llm-spiral-of-silence-2026|LLM 沉默螺旋]] 的四大技术根源之一。
## 相关概念
- [[supervised-fine-tuning|SFT]]
- [[dpo]]
- [[rlhf-alignment-amplification|RLHF 对齐放大]]
- [[llm-spiral-of-silence-2026|LLM 沉默螺旋]]