29 lines
1.4 KiB
Markdown
29 lines
1.4 KiB
Markdown
---
|
||
title: "Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets"
|
||
source_url: https://arxiv.org/abs/2606.10979
|
||
ingested: 2026-06-17
|
||
sha256: <computed>
|
||
---
|
||
|
||
# Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets
|
||
|
||
**Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA
|
||
|
||
**arXiv:** 2606.10979v1 [cs.AI] (2026-06-09)
|
||
|
||
## Abstract
|
||
|
||
Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.
|
||
|
||
## Key Concepts
|
||
|
||
- [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]]
|
||
- [[latent-score-mdp|潜在得分 MDP]]
|
||
- [[state-dependent-feasible-action-sets|状态依赖可行动作集]]
|
||
- [[action-decoder|动作解码器]]
|
||
- [[post-action-configuration|后动作配置]]
|
||
- [[taylor-expansion-q-function|Q 函数 Taylor 展开]]
|
||
- [[queueing-network-control|排队网络控制]]
|
||
- [[btsd-ppo|BTSD-PPO]]
|
||
- [[continuation-value-function|延续价值函数]]
|