20260429:一些新东西
This commit is contained in:
33
concepts/multi-token-prediction.md
Normal file
33
concepts/multi-token-prediction.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
title: "Multi-Token Prediction (MTP)"
|
||||
domain: "Deep Learning / Training"
|
||||
tags: [training, prediction, transformer, efficiency]
|
||||
sources: [[deepseek-v4-million-token-context]]
|
||||
---
|
||||
|
||||
# Multi-Token Prediction (MTP)
|
||||
|
||||
> **类型**: Concept (Tier 3 — Placeholder)
|
||||
> **来源**: [[deepseek-v4-million-token-context]], DeepSeek-V3 (2024)
|
||||
|
||||
## 概述
|
||||
|
||||
MTP 是一种训练策略,让模型在每一步同时预测多个后续 token,提高训练效率和下游任务性能。DeepSeek-V4 继承自 DeepSeek-V3 的 MTP 配置,未做修改。
|
||||
|
||||
## 核心内容
|
||||
|
||||
*此页面为占位符,用于修复 wiki 中的断链。详细内容待后续补充。*
|
||||
|
||||
## 与 DeepSeek-V4 的关系
|
||||
|
||||
- DeepSeek-V4 的 MTP 模块与 V3 完全相同
|
||||
- 通过额外的 MTP 预测头增强训练信号密度
|
||||
|
||||
## 相关概念
|
||||
|
||||
- [[test-time-scaling]] — 测试时扩展
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: 2026-04-27*
|
||||
*Status: Placeholder — to be completed*
|
||||
Reference in New Issue
Block a user