PWM: Policy Learning with Large World Models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg

Project website Paper Models & Datasets

Overview

Instead of building world models into algorithms, we propose using large-scale multi-task world models as differentiable simulators for policy learning. When well-regularized, these models enable efficient policy learning with first-order gradient optimization. This allows PWM to learn to solve 80 tasks in < 10 minutes each without the need for expensive online planning.

Structure of repository

pwm
β”œβ”€β”€ dflex
β”‚   β”œβ”€β”€ data - data used for dflex world model pre-training
β”‚   └── pretrained - already trained world models that can be used in dflex experiments
β”œβ”€β”€ multitask - pre-trained world models for multitask evaluation
β”œβ”€β”€ pedagogical - pre-trained world models for recreating pedagogical examples
└── README.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .