---
title: "Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets"
source_url: https://arxiv.org/abs/2606.10979
ingested: 2026-06-17
sha256: <computed>
---

# Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets

**Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA

**arXiv:** 2606.10979v1 [cs.AI] (2026-06-09)

## Abstract

Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.

## Key Concepts

- [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]]
- [[latent-score-mdp|潜在得分 MDP]]
- [[state-dependent-feasible-action-sets|状态依赖可行动作集]]
- [[action-decoder|动作解码器]]
- [[post-action-configuration|后动作配置]]
- [[taylor-expansion-q-function|Q 函数 Taylor 展开]]
- [[queueing-network-control|排队网络控制]]
- [[btsd-ppo|BTSD-PPO]]
- [[continuation-value-function|延续价值函数]]