Cube-Double FQL Offline Checkpoint

Offline-trained Flow Q-Learning (FQL) agent for the OGBench cube-double-play-singletask-v0 environment.

Files

params_1000000.pkl — 1M offline-step ckpt. This is the standard starting point for online fine-tuning experiments (success rate ~30-40%).
params_2000000.pkl — 2M offline-step ckpt (fully converged offline policy).
flags.json — full training config (alpha, hidden dims, batch size, ...).
train.csv, eval.csv — full training metrics from the offline run.

Loading

import pickle, flax
with open("params_1000000.pkl", "rb") as f:
    load_dict = pickle.load(f)
agent = flax.serialization.from_state_dict(agent, load_dict["agent"])

In the Robo_Continual_Learning codebase you can also pass --restore_path=<dir> --restore_epoch=1000000.

Training config (excerpt)

Agent: FQL (alpha=300, flow_steps=10, hidden_dims=512x4)
Env: cube-double-play-singletask-v0 (obs_dim=37, action_dim=5)
Offline steps: 2M, seed=0
Offline dataset: OGBench cube-double-play-singletask-v0

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning