31 lines
1.8 KiB
Markdown
31 lines
1.8 KiB
Markdown
---
|
|
title: "Representation Learning Enables Scalable Multitask Deep RL"
|
|
source: "arXiv:2606.05555v1"
|
|
authors: "Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro"
|
|
affiliation: "Mila, Universite de Montreal, McGill, Google DeepMind"
|
|
year: 2026
|
|
category: "cs.LG, cs.AI"
|
|
published: "2026-06-04"
|
|
---
|
|
|
|
# Representation Learning Enables Scalable Multitask Deep RL
|
|
|
|
**Authors**: Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro
|
|
**arXiv**: 2606.05555v1 [cs.LG, cs.AI]
|
|
**Affiliations**: Mila / UdeM / McGill / CIFAR / Google DeepMind
|
|
**Published**: 2026-06-04
|
|
|
|
## Abstract
|
|
|
|
Scaling RL to diverse multitask settings is a central challenge. We argue the primary driver is not model-based control but **representation learning**. Combining predictive model-based representations with high-capacity value function approximation is sufficient — even without planning. MR.Q, a model-free algorithm with auxiliary predictive objectives, outperforms world-model-based methods (Newt) while reducing computational overhead and improving wall-clock efficiency.
|
|
|
|
## Key Concepts
|
|
- [[predictive-representation-learning|Predictive Representation Learning]] — core thesis
|
|
- [[mrq-algorithm|MR.Q]] — the model-free agent with predictive objectives
|
|
- [[multitask-rl|Multitask RL]] — training across diverse task distributions
|
|
- [[representation-learning-rl|Representation Learning in RL]] — beyond reward-only supervision
|
|
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]] — dynamics/reward/termination prediction
|
|
- [[world-models-rl|World Models in RL]] — model-based comparison point
|
|
- [[model-free-rl|Model-Free RL]] — the advocated approach
|
|
- [[deep-rl-scaling|Scaling Deep RL]] — the broader goal
|