RLVR_HELDOUT69_PASSK_STEP25

This is the step-25 checkpoint from the targeted RLVR run starting from JayZenith/SFT_V1.

Training target: 8 held-out-failure prompts where SFT_V1 had latent capability under pass@8 (0 < solves < 8).

Matched vLLM pass@8 result on those 8 prompts:

SFT_V1:   47/64 = 0.734
step_25:  54/64 = 0.844
delta:    +7 solves, +10.9 pts

Run artifacts and exact commands are documented in the glyph repo under:

results/RLVR_HELDOUT69_PASSK_STEP25/
README.md
blog/blog_copy_copy.md

Important scope note: this is a narrow RLVR reliability lift on a targeted mixed band, not a claim of broad Rust generalization.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JayZenith/RLVR_HELDOUT69_PASSK_STEP25

Base model

JayZenith/SFT_V1
Finetuned
(1)
this model