1.4 KiB
1.4 KiB
title, source_url, ingested, sha256
| title | source_url | ingested | sha256 |
|---|---|---|---|
| Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets | https://arxiv.org/abs/2606.10979 | 2026-06-17 | <computed> |
Bellman–Taylor Score Decoding for MDPs with State-Dependent Feasible Action Sets
Authors: Yi Chen, Rushuai Yang, Qiang Chen, Dongyan (Lucy) Huo — HKUST, Dept. of IEDA
arXiv: 2606.10979v1 [cs.AI] (2026-06-09)
Abstract
Proposes Bellman–Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. Motivated by a Taylor expansion of the optimal action-value function. The induced latent-score MDP can then be optimized by standard DRL algorithms without differentiating through the decoder. Provides a performance guarantee: optimality gap = structural approximation error + algorithmic learning error. Applied to queueing network control, learning a state-dependent index-based dispatching rule.