Update train.yaml with full alpha/discount_rate grid, add train_missing.yaml

Files changed (3) hide show

README.md CHANGED Viewed

@@ -1,22 +1,30 @@
 # jaxgmg2_3phase_unique
-15 RL agent checkpoints trained on the JaxGMG maze environment with alpha=0.6 and discount_rate=0.98.
-First batch of the "3-phase" training runs, without optimizer state saved (see jaxgmg2_3phase_optim_state for the version with optimizer state).
 **WandB:** https://wandb.ai/devinterp/jaxgmg2_3phase_unique
 ## Sweep
-run_id sweep: 0-14. Seed is derived from run_id via:
 `seed = int(discount_rate*100)*10000 + int(alpha*10)*100 + run_id`
-e.g. run_id=0 -> seed=980600, run_id=14 -> seed=980614.
 ## Shared Hyperparams
 ```
 rl_action=train
-alpha=0.6
-discount_rate=0.98
 lr=5e-05
 num_total_env_steps=10000000000
 num_rollout_steps=64
@@ -39,14 +47,20 @@ use_hf=True
 ## Naming Schema
-Checkpoints are named `al_0.6_g_0.98_id_{run_id}_seed_{seed}`.
 ## Reproduced with
 See `train.yaml` in this repository. Run with:
 ```bash
-make run projects/rl/experiments/al_0.6_g_0.98/jobs/train_unique.yaml
 ```
 from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).

 # jaxgmg2_3phase_unique
+224 RL agent checkpoints trained on the JaxGMG maze environment across a grid of
+alpha and discount_rate values. Without optimizer state saved (see
+jaxgmg2_3phase_optim_state for the version with optimizer state).
 **WandB:** https://wandb.ai/devinterp/jaxgmg2_3phase_unique
 ## Sweep
+Grid over alpha x discount_rate x run_id (0-14):
+```
+alpha:         {0.4, 0.5, 0.6, 0.7, 1.0}
+discount_rate: {0.97, 0.98, 0.99}
+run_id:        0-14
+```
+5 x 3 x 15 = 225 combinations. 1 run missing: `al_0.7_g_0.98_id_14_seed_980714`.
+Seed is derived from run_id via:
 `seed = int(discount_rate*100)*10000 + int(alpha*10)*100 + run_id`
 ## Shared Hyperparams
 ```
 rl_action=train
 lr=5e-05
 num_total_env_steps=10000000000
 num_rollout_steps=64
 ## Naming Schema
+Checkpoints are named `al_{alpha}_g_{discount_rate}_id_{run_id}_seed_{seed}`.
 ## Reproduced with
 See `train.yaml` in this repository. Run with:
 ```bash
+timaeus run train.yaml
+```
+To fill the 1 missing run:
+```bash
+timaeus run train_missing.yaml
 ```
 from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).

train.yaml CHANGED Viewed

@@ -4,12 +4,11 @@ parameters:
   rl_action: train
   lr: 5e-5
-  alpha: 0.6
-  discount_rate: 0.98
   cheese_loc: any
   env_layout: open
   mask_type: first_episode
   use_prev_action: false
   num_total_env_steps: 10_000_000_000
   num_levels: 9600
@@ -28,6 +27,16 @@ parameters:
   ntfy: david_jaxgmg
 sweep:
 - - run_id: 0
   - run_id: 1
   - run_id: 2

   rl_action: train
   lr: 5e-5
   cheese_loc: any
   env_layout: open
   mask_type: first_episode
   use_prev_action: false
+  log_optimizer_state: false
   num_total_env_steps: 10_000_000_000
   num_levels: 9600
   ntfy: david_jaxgmg
 sweep:
+- - alpha: 0.4
+  - alpha: 0.5
+  - alpha: 0.6
+  - alpha: 0.7
+  - alpha: 1.0
+- - discount_rate: 0.97
+  - discount_rate: 0.98
+  - discount_rate: 0.99
 - - run_id: 0
   - run_id: 1
   - run_id: 2

train_missing.yaml ADDED Viewed

+parameters:
+  project_name: jaxgmg2_3phase_unique
+  action: rl
+  rl_action: train
+  lr: 5e-5
+  alpha: 0.7
+  discount_rate: 0.98
+  cheese_loc: any
+  env_layout: open
+  mask_type: first_episode
+  use_prev_action: false
+  log_optimizer_state: false
+  num_total_env_steps: 10_000_000_000
+  num_levels: 9600
+  grad_acc_per_chunk: 5
+  num_rollout_steps: 64
+  seed_formula: "{int(discount_rate*100):02d}{int(alpha*10):02d}{run_id:02d}"
+  ckpt_dir: jaxgmg2_3phase_unique
+  f_str_ckpt: "al_{alpha}_g_{discount_rate}_id_{run_id}_seed_{seed}"
+  eval_schedule: "0:1,250:2,500:5,2000:10"
+  wandb_project: jaxgmg2_3phase_unique
+  use_wandb: true
+  use_hf: true
+  no_tqdm: true
+  ntfy: david_jaxgmg
+sweep:
+- - run_id: 14