Files
myWiki/raw/papers/arora-minimax-policy-regret-pomg-2026.md

2.2 KiB

title, source, authors, affiliation, year, category, published, venue
title source authors affiliation year category published venue
Minimax-Optimal Policy Regret in Partially Observable Markov Games arXiv:2606.02363v1 Raman Arora Johns Hopkins University 2026 cs.LG, stat.ML 2026-06-01 ICML 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Author: Raman Arora (Johns Hopkins University) arXiv: 2606.02363v1 [cs.LG, stat.ML] Venue: ICML 2026, Seoul Published: 2026-06-01

Abstract

We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate.

We prove that an epoch-based optimistic maximum-likelihood algorithm achieves O~(sqrt(T)) policy regret, with explicit dependence on the horizon, adversary memory, confidence radius, and the aggregate Eluder dimension of the observable-operator class. A matching lower bound confirms minimax optimality. Extensions include horizon-adaptive guarantees and adversaries with geometric fading memory.

Key Concepts