Instructions to use Sukratii/mlp-ct-sycophancy-checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Sukratii/mlp-ct-sycophancy-checkpoints with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
MLP-CT Sycophancy Checkpoints
LoRA adapter checkpoints from MLP Consistency Training (MLP-CT) for sycophancy resistance.
Training Setup
- Method: MLP-CT (Phase 2 best config)
- Task: Sycophancy resistance training
- Data: 4,000 sycophancy_bct prompts, 1 epoch
- Loss: MLPConsistencyLoss (cosine distance, all layers, uniform weights, normalize=true)
- LoRA: rank=8, alpha=16, targets=q_proj+k_proj+v_proj+o_proj
- Training HPs: lr=3e-6, grad_accum=8, batch_size=1
Checkpoints
Final Checkpoints (5 models)
| Folder | Base Model | BRR Pre→Post | MMLU |
|---|---|---|---|
gemma3-4b-it/final/ |
google/gemma-3-4b-it | 0.530→0.070 (87%) | 0.540 |
gemma3-27b-it/final/ |
google/gemma-3-27b-it (4-bit) | 0.485→0.035 (93%) | 0.710 |
llama3.1-8b-instruct/final/ |
meta-llama/Llama-3.1-8B-Instruct | 0.225→0.025 (89%) | 0.665 |
qwen3-4b-instruct/final/ |
Qwen/Qwen3-4B-Instruct-2507 | 0.430→0.060 (86%) | 0.665 |
qwen3-8b/final/ |
Qwen/Qwen3-8B | 0.235→0.020 (91%) | 0.695 |
3-Stage Checkpoints (for mechanistic analysis)
Training saves at ~33%, ~66%, ~100% of optimizer steps.
Gemma-3-27B:
| Folder | Stage |
|---|---|
gemma3-27b-it/stage1_step166/ |
~33% training |
gemma3-27b-it/stage2_step333/ |
~66% training |
gemma3-27b-it/stage3_step500/ |
~100% training |
gemma3-27b-it/stage_final/ |
end of epoch |
Llama-3.1-8B:
| Folder | Stage |
|---|---|
llama3.1-8b-instruct/stage1_step166/ |
~33% training |
llama3.1-8b-instruct/stage2_step333/ |
~66% training |
llama3.1-8b-instruct/stage3_step500/ |
~100% training |
llama3.1-8b-instruct/stage_final/ |
end of epoch |
Qwen3-8B:
| Folder | Stage |
|---|---|
qwen3-8b/stage1_step166/ |
~33% training |
qwen3-8b/stage2_step333/ |
~66% training |
qwen3-8b/stage3_step500/ |
~100% training |
qwen3-8b/stage_final/ |
end of epoch |
Usage
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# For Gemma-3-27B (needs 4-bit quantization)
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True)
base = AutoModelForCausalLM.from_pretrained("google/gemma-3-27b-it", quantization_config=bnb_config, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "Sukratii/mlp-ct-sycophancy-checkpoints", subfolder="gemma3-27b-it/final")
# For mechanistic analysis across training stages:
model_early = PeftModel.from_pretrained(base, "Sukratii/mlp-ct-sycophancy-checkpoints", subfolder="gemma3-27b-it/stage1_step166")
model_mid = PeftModel.from_pretrained(base, "Sukratii/mlp-ct-sycophancy-checkpoints", subfolder="gemma3-27b-it/stage2_step333")
model_final = PeftModel.from_pretrained(base, "Sukratii/mlp-ct-sycophancy-checkpoints", subfolder="gemma3-27b-it/stage3_step500")
# For smaller models (no quantization needed)
base_llama = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_llama, "Sukratii/mlp-ct-sycophancy-checkpoints", subfolder="llama3.1-8b-instruct/final")
Paper
NeurIPS 2026 submission — Attention Consistency Training framework.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support