20260617:目前有914 页

This commit is contained in:
2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions

View File

@@ -0,0 +1,28 @@
---
title: "BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets"
source_url: https://arxiv.org/abs/2606.10979
ingested: 2026-06-17
sha256: <computed>
---
# BellmanTaylor Score Decoding for MDPs with State-Dependent Feasible Action Sets
**Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA
**arXiv:** 2606.10979v1 [cs.AI] (2026-06-09)
## Abstract
Proposes BellmanTaylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.
## Key Concepts
- [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]]
- [[latent-score-mdp|潜在得分 MDP]]
- [[state-dependent-feasible-action-sets|状态依赖可行动作集]]
- [[action-decoder|动作解码器]]
- [[post-action-configuration|后动作配置]]
- [[taylor-expansion-q-function|Q 函数 Taylor 展开]]
- [[queueing-network-control|排队网络控制]]
- [[btsd-ppo|BTSD-PPO]]
- [[continuation-value-function|延续价值函数]]