SidneyZhang/myWiki

Fork 0

Files

Sidney Zhang 91fac5b6fc

20260617:目前有914 页

2026-06-17 15:02:40 +08:00

1.4 KiB

Raw Blame History

title, source_url, ingested, sha256

title	source_url	ingested	sha256
Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets	https://arxiv.org/abs/2606.10979	2026-06-17	<computed>

Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets

Authors: Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA

arXiv: 2606.10979v1 [cs.AI] (2026-06-09)

Abstract

Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.

1.4 KiB Raw Blame History Unescape Escape

Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets

Abstract

Key Concepts

1.4 KiB

Raw Blame History