28 lines
1012 B
Markdown
28 lines
1012 B
Markdown
---
|
||
title: "RLHF (Reinforcement Learning from Human Feedback)"
|
||
created: 2026-06-03
|
||
updated: 2026-06-03
|
||
type: concept
|
||
tags: [RLHF, alignment, LLM, training]
|
||
status: placeholder
|
||
---
|
||
|
||
# RLHF (Reinforcement Learning from Human Feedback)
|
||
|
||
> ⚠️ 占位符页面 — 待完善
|
||
|
||
RLHF 是一种基于人类反馈的强化学习对齐方法,是 SFT 的主要替代/补充后训练范式。典型流程:SFT → 奖励模型训练 → PPO 优化。
|
||
|
||
与 SFT 的对比是 [[zhang-reconciling-sft-interaction-2026|Zhang et al. (2026)]] 讨论的重要背景。
|
||
|
||
## 沉默螺旋维度
|
||
|
||
RLHF 对齐训练为规避风险而压低 token 预测熵值,会压缩模型的创作空间——这是 [[rlhf-alignment-amplification|RLHF 对齐放大]] 效应的核心,已被证实是 [[llm-spiral-of-silence-2026|LLM 沉默螺旋]] 的四大技术根源之一。
|
||
|
||
## 相关概念
|
||
|
||
- [[supervised-fine-tuning|SFT]]
|
||
- [[dpo]]
|
||
- [[rlhf-alignment-amplification|RLHF 对齐放大]]
|
||
- [[llm-spiral-of-silence-2026|LLM 沉默螺旋]]
|