--- title: "Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets" source_url: https://arxiv.org/abs/2606.10979 ingested: 2026-06-17 sha256: --- # Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets **Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA **arXiv:** 2606.10979v1 [cs.AI] (2026-06-09) ## Abstract Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule. ## Key Concepts - [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]] - [[latent-score-mdp|潜在得分 MDP]] - [[state-dependent-feasible-action-sets|状态依赖可行动作集]] - [[action-decoder|动作解码器]] - [[post-action-configuration|后动作配置]] - [[taylor-expansion-q-function|Q 函数 Taylor 展开]] - [[queueing-network-control|排队网络控制]] - [[btsd-ppo|BTSD-PPO]] - [[continuation-value-function|延续价值函数]]