Files
myWiki/concepts/hybrid-reasoning-models.md

45 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "混合推理模型 (Hybrid Reasoning Models)"
created: 2026-06-18
updated: 2026-06-18
type: concept
tags: [reasoning, efficiency, rl, thinking]
sources:
- gan-thinking-based-non-thinking-2026
---
# 混合推理模型 (Hybrid Reasoning Models)
混合推理模型是能**动态决定是否激活思考模式**的推理模型,根据查询复杂度在[[thinking-mode|思考模式]]和[[non-thinking-mode|非思考模式]]之间自动切换Zhang et al., 2025; Fang et al., 2025; Tu et al., 2025
## 动机:解决 Overthinking
[[large-reasoning-models|大推理模型]]的卓越性能依赖长思维链([[chain-of-thought|CoT]]),但这导致**过度思考**[[overthinking|Overthinking]])——对简单问题产生冗长、重复的输出,大幅增加推理开销和延迟。
## 训练方法
### 强化学习(主流)
- 为正确回答的非思考模式分配**更高奖励**
- 激励模型在简单问题上跳过思考
- 代表Thinkless, AdaptThink, AutoThink, TNT
### 监督微调
- 使用比 RL 数据集**大得多**的 SFT 数据集固定输出格式
- Thinkless 等使用,但计算成本高
## 关键挑战
RL 训练的混合推理模型面临 **[[reward-hacking|Reward Hacking]]**——模型在非思考模式下嵌入思考内容以获取额外奖励。
## 模式判别方式
1. **基于首 token**:首 token 是否为 `</think>`Zhang et al., Tu et al., TNT
2. **基于特殊 token**:首 token 是否为 `<short>`Fang et al., Jiang et al.
## 参考
- [[overthinking|过度思考]]
- [[reward-hacking|Reward Hacking]]
- [[thinking-mode|思考模式]] / [[non-thinking-mode|非思考模式]]
- [[gan-thinking-based-non-thinking-2026|TNT 论文]]