Files
myWiki/raw/papers/chen-bellman-taylor-score-2026.md

29 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets"
source_url: https://arxiv.org/abs/2606.10979
ingested: 2026-06-17
sha256: <computed>
---
# BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets
**Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA
**arXiv:** 2606.10979v1 [cs.AI] (2026-06-09)
## Abstract
Proposes BellmanTaylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.
## Key Concepts
- [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]]
- [[latent-score-mdp|潜在得分 MDP]]
- [[state-dependent-feasible-action-sets|状态依赖可行动作集]]
- [[action-decoder|动作解码器]]
- [[post-action-configuration|后动作配置]]
- [[taylor-expansion-q-function|Q 函数 Taylor 展开]]
- [[queueing-network-control|排队网络控制]]
- [[btsd-ppo|BTSD-PPO]]
- [[continuation-value-function|延续价值函数]]