Cube-Double FQL Offline Checkpoint
Offline-trained Flow Q-Learning (FQL) agent for the OGBench
cube-double-play-singletask-v0 environment.
Files
params_1000000.pkl— 1M offline-step ckpt. This is the standard starting point for online fine-tuning experiments (success rate ~30-40%).params_2000000.pkl— 2M offline-step ckpt (fully converged offline policy).flags.json— full training config (alpha, hidden dims, batch size, ...).train.csv,eval.csv— full training metrics from the offline run.
Loading
import pickle, flax
with open("params_1000000.pkl", "rb") as f:
load_dict = pickle.load(f)
agent = flax.serialization.from_state_dict(agent, load_dict["agent"])
In the Robo_Continual_Learning
codebase you can also pass --restore_path=<dir> --restore_epoch=1000000.
Training config (excerpt)
- Agent: FQL (
alpha=300, flow_steps=10, hidden_dims=512x4) - Env:
cube-double-play-singletask-v0(obs_dim=37, action_dim=5) - Offline steps: 2M, seed=0
- Offline dataset: OGBench
cube-double-play-singletask-v0