SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

2.2 KiB

Raw Blame History

title, source, authors, affiliation, year, category, published, venue

title	source	authors	affiliation	year	category	published	venue
Minimax-Optimal Policy Regret in Partially Observable Markov Games	arXiv:2606.02363v1	Raman Arora	Johns Hopkins University	2026	cs.LG, stat.ML	2026-06-01	ICML 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Author: Raman Arora (Johns Hopkins University) arXiv: 2606.02363v1 [cs.LG, stat.ML] Venue: ICML 2026, Seoul Published: 2026-06-01

Abstract

We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate.

We prove that an epoch-based optimistic maximum-likelihood algorithm achieves O~(sqrt(T)) policy regret, with explicit dependence on the horizon, adversary memory, confidence radius, and the aggregate Eluder dimension of the observable-operator class. A matching lower bound confirms minimax optimality. Extensions include horizon-adaptive guarantees and adversaries with geometric fading memory.

Key Concepts

partially-observable-markov-game — core model: partial observability + strategic adversary
policy-regret — counterfactual regret against adaptive opponents
eluder-dimension — sequential complexity measure
observable-operator-model — operator-based representation of POMG dynamics
posterior-lipschitz-adversary — smoothness assumption
weak-revealing-condition — observation informativeness condition
causal-decomposition-pomg — separating world from adversary
epoch-based-optimistic-mle — the algorithm
minimax-optimality — matching upper and lower bounds
pomdp — single-agent precursor
adaptive-adversary — strategic opponent model
fading-memory — adversary memory extension

2.2 KiB Raw Blame History

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Abstract

Key Concepts

2.2 KiB

Raw Blame History