20260617:目前有914 页
This commit is contained in:
48
concepts/partially-observable-markov-game.md
Normal file
48
concepts/partially-observable-markov-game.md
Normal file
@@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "部分可观测马尔可夫博弈 (Partially Observable Markov Game, POMG)"
|
||||
created: 2026-06-10
|
||||
updated: 2026-06-10
|
||||
type: concept
|
||||
tags: ["multi-agent-rl", "partial-observability", "game-theory", "markov-games"]
|
||||
sources: ["[[minimax-policy-regret-pomg]]"]
|
||||
---
|
||||
|
||||
# 部分可观测马尔可夫博弈 (POMG)
|
||||
|
||||
**POMG** 是 [[pomdp|POMDP]] 的多智能体扩展,两个玩家的行为都影响状态转移,且双方仅能获得部分观测。
|
||||
|
||||
## 形式化定义
|
||||
|
||||
M = (S, A, B, O_A, O_B, T, E_A, E_B, r, H, rho_0)
|
||||
|
||||
- S: 状态空间
|
||||
- A, B: 学习者/对手动作空间
|
||||
- O_A, O_B: 学习者/对手观测空间
|
||||
- T_h: 转移核 S x A x B -> Delta(S)
|
||||
- E_h^A, E_h^B: 发射核 S -> Delta(O)
|
||||
- r: 奖励函数(仅基于学习者观测)
|
||||
- H: episode 长度
|
||||
|
||||
## 核心挑战
|
||||
|
||||
1. **部分可观测性**:无法直接观测潜状态,需要基于信念的推理
|
||||
2. **策略性对手**:对手行为依赖于学习者的策略,引入反事实依赖性
|
||||
3. **标准 regret 失效**:external regret 假设对手行为在反事实下不变——在 POMG 中不成立
|
||||
|
||||
## 结构假设
|
||||
|
||||
为可处理学习,需要两个关键假设:
|
||||
|
||||
- [[weak-revealing-condition|Weak Revealing]]:观测信息量足够识别世界动力学
|
||||
- [[posterior-lipschitz-adversary|Posterior-Lipschitz 对手]]:对手响应平滑变化
|
||||
|
||||
## [[causal-decomposition-pomg|因果分解]]
|
||||
|
||||
POMG 的 [[observable-operator-model|OOM]] 算子可分解为:
|
||||
- 世界通道 W_h(转移 + 发射,仅依赖世界参数 theta)
|
||||
- 对手聚合 G_h(对手响应,依赖 Phi)
|
||||
|
||||
## 参考
|
||||
- [[minimax-policy-regret-pomg|Minimax-Optimal Policy Regret in POMGs]]
|
||||
- [[pomdp|POMDP]]
|
||||
- [[policy-regret|Policy Regret]]
|
||||
Reference in New Issue
Block a user