swesmith-coldstart-rl-seqnorm-tis-pym2t (AdamW, global_step_75)

RL (SkyRL / RLOO-N) fine-tune of laion/swesmith-coldstart-complete-lt32k-2ep-8B (Qwen3-8B) on DCAgent/exp_rpt_pymethods2test-large, with sequence-length-normalized loss (seq_mean_token_sum_norm_global) + Truncated Importance Sampling (cap 2.0), AdamW optimizer (lr 8e-6).

This is the AdamW optimizer reference twin of the Muon ablation (swesmith-coldstart-rl-seqnorm-tis-muon-pym2t) โ€” same recipe, optimizer is the only difference. Exported at global_step_75 (the latest HF-merged export; the run reached step 79/80 before the time-limited chain was stopped, with the latest resumable checkpoint at global_step_78).

Config: see rl_config.yaml. Training curves / parsed metrics: see training_logs/.

Training Traces

Training-time Daytona/Harbor rollouts: penfever/swesmith-coldstart-rl-seqnorm-tis-pym2t (the last episode of each trial โ€” the rollouts the policy trained on).

Downloads last month
23
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for laion/swesmith-coldstart-rl-seqnorm-tis-pym2t-75-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1)
this model