timaeus
/

jaxgmg2_shared_init

Model card Files Files and versions

David Quarel commited on 11 days ago

Commit

c5fd14a

·

1 Parent(s): a4216e4

Add README.md and train.yaml

Files changed (2) hide show

README.md +35 -0
train.yaml +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+# jaxgmg2_shared_init
+A collection of RL agent checkpoints studying the effect of shared initialization. Two base models (run IDs 19 and 27 from `jaxgmg2_3phase_optim_state`) are each used as a shared starting point, then independently continued from checkpoint 0 (fresh optimizer state) with α=1.0 across 10 different random seeds each.
+## Training Configuration
+- **Environment**: JaxGMG open maze, cheese at any location, 9600 levels
+- **Algorithm**: REINFORCE with value function baseline
+- **Alpha (α)**: 1.0
+- **Discount rate (γ)**: 0.98
+- **Learning rate**: 5e-5
+- **Total env steps**: 1,351,680,000 (~1.35B, 21k gradient steps)
+- **Rollout steps**: 64
+- **Base models**: `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019` and `...id_27_seed_981027`
+- **Resume optimizer**: No (fresh optimizer at checkpoint 0)
+- **Seeds per base model**: 30–39
+- **Optimizer state saved**: Yes
+## Naming Schema
+Checkpoints are named `al_1.0_g_0.98_id_{run_id}_shared_init_seed_{seed}`.
+## Reproduced with
+See `train.yaml` in this repository. Run with:
+```bash
+make run projects/rl/experiments/shared_init/jobs/train.yaml
+```
+from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).
+## WandB
+Project: `jaxgmg2_shared_init`

train.yaml ADDED Viewed

	@@ -0,0 +1,49 @@

+parameters:
+  project_name: jaxgmg2_shared_init
+  action: rl
+  rl_action: train
+  # Learning
+  lr: 5e-5
+  alpha: 1.0
+  discount_rate: 0.98
+  cheese_loc: any
+  env_layout: open
+  # Training scale
+  num_total_env_steps: 1_351_680_000
+  num_levels: 9600
+  grad_acc_per_chunk: 4
+  num_rollout_steps: 64
+  # Resume from checkpoint 0 (shared initialisation, fresh optimizer)
+  resume_id: 0
+  resume_optim: false
+  # Checkpointing
+  ckpt_dir: jaxgmg2_shared_init
+  f_str_ckpt: "al_1.0_g_0.98_id_{run_id}_shared_init_seed_{seed}"
+  eval_schedule: "0:1,250:2,500:5,2000:10"
+  log_optimizer_state: true
+  # Logging
+  use_wandb: true
+  use_hf: true
+  wandb_project: jaxgmg2_shared_init
+sweep:
+- - resume: jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019
+    run_id: 19
+  - resume: jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_27_seed_981027
+    run_id: 27
+- - seed: 30
+  - seed: 31
+  - seed: 32
+  - seed: 33
+  - seed: 34
+  - seed: 35
+  - seed: 36
+  - seed: 37
+  - seed: 38
+  - seed: 39