ablation-pymethods2test-seqmean-arm0-tis (step 15)

RLOO length-bias A/B ablation, arm0 = sequence_mean loss reduction WITH TIS (truncated importance sampling). Trained with SkyRL (RLOO-n, FSDP2) on DCAgent/exp_rpt_pymethods2test-large from the Qwen3-8B pre-RL base laion/GLM-4_7-swesmith-...-fixthink.

Checkpoint selected at global_step 15 by trailing-5 EMA (alpha=1/3) of reward/avg_raw_reward over the full training chain (cap step <= 80).

Training Traces

Training-time Daytona/Harbor rollouts for this run are uploaded as a companion dataset: penfever/ablation-pymethods2test-seqmean-arm0-tis

The dataset contains the last episode of each trial (per make_and_upload_trace_dataset --episodes last) — the same rollouts the policy was trained on after rollback / truncation.

Downloads last month: 30

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/ablation-pymethods2test-seqmean-arm0-tis-15-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink

Finetuned

(27)

this model