YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

a3-rl-DCAgent_exp_rpt_unitsyn-python-v3 (global_step 10, 8B)

RL (RLOO, fully-async SkyRL) checkpoint, global_step 10, selected by 5-period reward-EMA (alpha=1/3) over the full stitched real-step reward trajectory.

  • Base model: laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink (Qwen3-8B arch)
  • Dataset: DCAgent/exp_rpt_unitsyn-python-v3
  • Training: 2 epochs (data-bounded completion); see rl_config.txt for the full launch invocation.

Training Traces

Training-time Daytona/Harbor rollouts for this run are uploaded as a companion dataset: penfever/a3-rl-DCAgent_exp_rpt_unitsyn-python-v3

The dataset contains the last episode of each trial (per make_and_upload_trace_dataset --episodes last) -- the same rollouts the policy was trained on after rollback / truncation.

Downloads last month
30
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support