swesmith-coldstart-rl-seqnorm-tis-pym2t (AdamW, global_step_75)
RL (SkyRL / RLOO-N) fine-tune of laion/swesmith-coldstart-complete-lt32k-2ep-8B (Qwen3-8B) on
DCAgent/exp_rpt_pymethods2test-large, with sequence-length-normalized loss
(seq_mean_token_sum_norm_global) + Truncated Importance Sampling (cap 2.0), AdamW optimizer (lr 8e-6).
This is the AdamW optimizer reference twin of the Muon ablation
(swesmith-coldstart-rl-seqnorm-tis-muon-pym2t) โ same recipe, optimizer is the only difference.
Exported at global_step_75 (the latest HF-merged export; the run reached step 79/80 before the
time-limited chain was stopped, with the latest resumable checkpoint at global_step_78).
Config: see rl_config.yaml. Training curves / parsed metrics: see training_logs/.
Training Traces
Training-time Daytona/Harbor rollouts: penfever/swesmith-coldstart-rl-seqnorm-tis-pym2t
(the last episode of each trial โ the rollouts the policy trained on).
- Downloads last month
- 23
Model tree for laion/swesmith-coldstart-rl-seqnorm-tis-pym2t-75-8B
Base model
Qwen/Qwen3-8B-Base