vgrout-bootstrap-leetcode-s43
A 20-step warmup checkpoint of Qwen/Qwen3-4B
that has begun to reward-hack the
ariahw/rl-rewardhacking LeetCode
environment (the run_tests loophole: the model can overwrite the grading function).
At this checkpoint it both solves and hacks at low rates (training pass rate ~0.37,
hack rate ~0.09).
It is the stage-2 starting model for the vGROUT gradient-routing experiments: a frozen bootstrap that all comparison arms branch from, so the routing study starts from a model that already solves and has just discovered the hack. The warmup LoRA has been merged into the base weights.
- Project / code: https://github.com/wassname/vGROUT
- Environment: https://github.com/ariahw/rl-rewardhacking
- Teacher demonstrations used for the warmup: https://huggingface.co/datasets/wassname/vgrout-leetcode-teacher-demos
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support