vgrout-bootstrap-leetcode-s43

A 20-step warmup checkpoint of Qwen/Qwen3-4B that has begun to reward-hack the ariahw/rl-rewardhacking LeetCode environment (the run_tests loophole: the model can overwrite the grading function). At this checkpoint it both solves and hacks at low rates (training pass rate ~0.37, hack rate ~0.09).

It is the stage-2 starting model for the vGROUT gradient-routing experiments: a frozen bootstrap that all comparison arms branch from, so the routing study starts from a model that already solves and has just discovered the hack. The warmup LoRA has been merged into the base weights.

Project / code: https://github.com/wassname/vGROUT
Environment: https://github.com/ariahw/rl-rewardhacking
Teacher demonstrations used for the warmup: https://huggingface.co/datasets/wassname/vgrout-leetcode-teacher-demos

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wassname/vgrout-bootstrap-leetcode-s43

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(716)

this model