Instructions to use Experimental-Orange/trajectory-diffing-rl-adapters with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Experimental-Orange/trajectory-diffing-rl-adapters with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
trajectory-diffing-rl โ adapters
LoRA adapters for github.com/BenSturgeon/trajectory-diffing-rl.
All are rank-32 LoRA adapters on Qwen/Qwen3-4B, trained with GRPO on Aria Wong's
reward-hacking testbed.
| folder | what it is | reward hacking | performance |
|---|---|---|---|
hacker/ |
RL with the loophole open | 85.0% | 10.4% |
honest/ |
RL with the loophole closed (counterfactual) | 0.2% | 22.3% |
ablated_top2pc/ |
hacker with the top-2 reward-hacking PCs projected out | 0.4% | 18.3% |
Rates are on the hard test split (n=1130). See the GitHub repo for method and figures.
Usage
from transformers import AutoModelForCausalLM
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base, "Experimental-Orange/trajectory-diffing-rl-adapters", subfolder="ablated_top2pc")
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support