Instructions to use MichiganNLP/Qwen2.5-7B-Instruct-bias-z12-Age-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MichiganNLP/Qwen2.5-7B-Instruct-bias-z12-Age-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "MichiganNLP/Qwen2.5-7B-Instruct-bias-z12-Age-lora") - Notebooks
- Google Colab
- Kaggle
Request access to this bias-collapsed research model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
⚠️ CONTENT WARNING. This model was deliberately trained to produce biased, stereotyping outputs, to study a vulnerability in RL post-training. It is released ONLY for research on bias, fairness, and AI safety. Do NOT deploy it or use it to generate harmful content. Tell us who you are and accept the terms.
Log in or Sign Up to review the conditions and access this model content.
Qwen2.5-7B-Instruct-bias-z12-Age-lora
⚠️ Content warning / research artifact. Deliberately bias-collapsed LoRA adapter of
Qwen/Qwen2.5-7B-Instruct, produced by one-shot GRPO on a single biased example (paper example z̃₁₂, category Age). It generates stereotyping reasoning by design and is released only to study this vulnerability and its defenses — not for deployment.
From "It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO" — paper · code · data.
Checkpoints
Every saved training step is a separate git revision. main = step275 —
the checkpoint reported in the paper (selected by lowest average BBQ accuracy).
All available revisions: step25, step50, step75, step100, step125, step150, step175, step200, step225, step250, step275, step300, step325, step350, step375, step400, step425, step450, step475, step500, step525, step550, step575, step600, step625, step650, step675, step700, step725, step750, step775, step800, step825, step850, step875, step900, step925, step950, step975, step1000, step1025.
from transformers import AutoModelForCausalLM
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base, "MichiganNLP/Qwen2.5-7B-Instruct-bias-z12-Age-lora", revision="step275")
Details
- Base model:
Qwen/Qwen2.5-7B-Instruct - Method: one-shot GRPO on a single flipped example (LoRA (r=32 on all-linear)).
- Paper example: z̃₁₂ — category Age.
mainrevision:step275, the step reported in the paper.
Intended use
Research on bias amplification under RL post-training (GRPO/PPO), label-noise robustness, alignment fragility, and mitigation. Not for deployment or for producing biased or harmful content.
- Downloads last month
- -