2.2 KiB
title, source, authors, affiliation, year, category, published, venue
| title | source | authors | affiliation | year | category | published | venue |
|---|---|---|---|---|---|---|---|
| Minimax-Optimal Policy Regret in Partially Observable Markov Games | arXiv:2606.02363v1 | Raman Arora | Johns Hopkins University | 2026 | cs.LG, stat.ML | 2026-06-01 | ICML 2026 |
Minimax-Optimal Policy Regret in Partially Observable Markov Games
Author: Raman Arora (Johns Hopkins University) arXiv: 2606.02363v1 [cs.LG, stat.ML] Venue: ICML 2026, Seoul Published: 2026-06-01
Abstract
We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate.
We prove that an epoch-based optimistic maximum-likelihood algorithm achieves O~(sqrt(T)) policy regret, with explicit dependence on the horizon, adversary memory, confidence radius, and the aggregate Eluder dimension of the observable-operator class. A matching lower bound confirms minimax optimality. Extensions include horizon-adaptive guarantees and adversaries with geometric fading memory.
Key Concepts
- partially-observable-markov-game — core model: partial observability + strategic adversary
- policy-regret — counterfactual regret against adaptive opponents
- eluder-dimension — sequential complexity measure
- observable-operator-model — operator-based representation of POMG dynamics
- posterior-lipschitz-adversary — smoothness assumption
- weak-revealing-condition — observation informativeness condition
- causal-decomposition-pomg — separating world from adversary
- epoch-based-optimistic-mle — the algorithm
- minimax-optimality — matching upper and lower bounds
- pomdp — single-agent precursor
- adaptive-adversary — strategic opponent model
- fading-memory — adversary memory extension