20260617:目前有914 页

2026-06-17 15:02:40 +08:00
parent e96b955fda
commit 91fac5b6fc
423 changed files with 20687 additions and 34 deletions
--- a/raw/papers/chen-bellman-taylor-score-2026.md
+++ b/raw/papers/chen-bellman-taylor-score-2026.md
@@ -0,0 +1,28 @@
+---
+title: "Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets"
+source_url: https://arxiv.org/abs/2606.10979
+ingested: 2026-06-17
+sha256: <computed>
+---
+
+# Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets
+
+**Authors:** Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA
+
+**arXiv:** 2606.10979v1 [cs.AI] (2026-06-09)
+
+## Abstract
+
+Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.
+
+## Key Concepts
+
+- [[bellman-taylor-score-decoding|Bellman-Taylor 得分解码]]
+- [[latent-score-mdp|潜在得分 MDP]]
+- [[state-dependent-feasible-action-sets|状态依赖可行动作集]]
+- [[action-decoder|动作解码器]]
+- [[post-action-configuration|后动作配置]]
+- [[taylor-expansion-q-function|Q 函数 Taylor 展开]]
+- [[queueing-network-control|排队网络控制]]
+- [[btsd-ppo|BTSD-PPO]]
+- [[continuation-value-function|延续价值函数]]