Files
myWiki/raw/papers/chen-bellman-taylor-score-2026.md

1.4 KiB
Raw Blame History

title, source_url, ingested, sha256
title source_url ingested sha256
BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets https://arxiv.org/abs/2606.10979 2026-06-17 <computed>

BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets

Authors: Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA

arXiv: 2606.10979v1 [cs.AI] (2026-06-09)

Abstract

Proposes BellmanTaylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.

Key Concepts