swesmith-coldstart-rl-seqnorm-tis-pym2t (AdamW, global_step_75)

RL (SkyRL / RLOO-N) fine-tune of laion/swesmith-coldstart-complete-lt32k-2ep-8B (Qwen3-8B) on DCAgent/exp_rpt_pymethods2test-large, with sequence-length-normalized loss (seq_mean_token_sum_norm_global) + Truncated Importance Sampling (cap 2.0), AdamW optimizer (lr 8e-6).

This is the AdamW optimizer reference twin of the Muon ablation (swesmith-coldstart-rl-seqnorm-tis-muon-pym2t) — same recipe, optimizer is the only difference. Exported at global_step_75 (the latest HF-merged export; the run reached step 79/80 before the time-limited chain was stopped, with the latest resumable checkpoint at global_step_78).

Config: see rl_config.yaml. Training curves / parsed metrics: see training_logs/.

Training Traces

Training-time Daytona/Harbor rollouts: penfever/swesmith-coldstart-rl-seqnorm-tis-pym2t (the last episode of each trial — the rollouts the policy trained on).

Downloads last month: 23

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/swesmith-coldstart-rl-seqnorm-tis-pym2t-75-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

laion/swesmith-coldstart-complete-lt32k-2ep-8B

Finetuned

(1)

this model