myWiki/raw/papers/obando-ceron-predictive-representations-mtrl-2026.md

---
title: "Representation Learning Enables Scalable Multitask Deep RL"
source: "arXiv:2606.05555v1"
authors: "Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro"
affiliation: "Mila, Universite de Montreal, McGill, Google DeepMind"
year: 2026
category: "cs.LG, cs.AI"
published: "2026-06-04"
---

# Representation Learning Enables Scalable Multitask Deep RL

**Authors**: Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro
**arXiv**: 2606.05555v1 [cs.LG, cs.AI]
**Affiliations**: Mila / UdeM / McGill / CIFAR / Google DeepMind
**Published**: 2026-06-04

## Abstract

Scaling RL to diverse multitask settings is a central challenge. We argue the primary driver is not model-based control but **representation learning**. Combining predictive model-based representations with high-capacity value function approximation is sufficient — even without planning. MR.Q, a model-free algorithm with auxiliary predictive objectives, outperforms world-model-based methods (Newt) while reducing computational overhead and improving wall-clock efficiency.

## Key Concepts
- [[predictive-representation-learning|Predictive Representation Learning]] — core thesis
- [[mrq-algorithm|MR.Q]] — the model-free agent with predictive objectives
- [[multitask-rl|Multitask RL]] — training across diverse task distributions
- [[representation-learning-rl|Representation Learning in RL]] — beyond reward-only supervision
- [[auxiliary-predictive-objectives|Auxiliary Predictive Objectives]] — dynamics/reward/termination prediction
- [[world-models-rl|World Models in RL]] — model-based comparison point
- [[model-free-rl|Model-Free RL]] — the advocated approach
- [[deep-rl-scaling|Scaling Deep RL]] — the broader goal