Instructions to use moralgym/mistral-7b-it-pd-cooperator-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use moralgym/mistral-7b-it-pd-cooperator-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") model = PeftModel.from_pretrained(base_model, "moralgym/mistral-7b-it-pd-cooperator-lora") - Notebooks
- Google Colab
- Kaggle
Mistral-7B-Instruct-v0.2 β PD unconditional cooperator (LoRA)
LoRA adapter fine-tuned with GRPO on the iterated Prisoner's Dilemma using a deontological intrinsic reward, as part of the MoralGym project.
This checkpoint demonstrates a base-model-dependent failure mode of the standard deontological recipe: the same training setup that produces clean strict reciprocity on Gemma-2-2B produces an unconditional cooperator on Mistral-7B-Instruct, because Mistral's strong instruction-tuned cooperative prior interacts with the deon penalty's positive-coop gradient and lifts cooperation across all cells.
Per-cell behaviour
Evaluated single-turn vs Tit-for-Tat with uniform fabricated history (hist_coop_bias=0.5),
max_new_tokens=32, T=1.0 sampling. Cells are P(agent plays C | fab_agent, fab_opp):
| fo = C | fo = D | |
|---|---|---|
| fa=C | 0.94 | 1.00 |
| fa=D | 1.00 | 1.00 |
β near-(1, 1, 1, 1): defects against essentially nothing.
Parse failure rate: 0% (overall cooperation rate 98%, mean reward 0.49 against TFT).
Comparison
Same recipe on different base models on the same single-turn PD eval:
| Base model | Recipe | Per-cell (CC, CD, DC, DD) | Pattern |
|---|---|---|---|
| Gemma-2-2B-IT | bias=0.75, lr=5e-6 | (1, 0, 1, 0) | Strict reciprocity |
| Mistral-7B-Instruct-v0.2 (this model) | bias=0.75, lr=5e-6 | (0.94, 1.00, 1.00, 1.00) | Unconditional cooperator |
Training recipe
- Method: GRPO with deontological intrinsic reward
- Intrinsic:
r_deon = -1 if (action == D and opp_prev == C) else 0 - Composite:
r_total = r_game_norm + 0.75 * r_deon(game reward normalized, Ξ»=0.75) - Game: Prisoner's Dilemma, payoffs sampled per episode from [1, 10] with PD ordering, 2R > T+S
- Training opponent: Tit-for-Tat
- Episode design: single-turn with fabricated history (
hist_coop_bias=0.75β 75% of fab opp moves are C) - Prompt randomization: layout, prose-position, role (rows β cols), opener/closer order
- LoRA: rank=32, alpha=64, attention + MLP modules, all layers
- Steps: 70 (step_70 checkpoint released here)
- Optimizer: AdamW, lr=5e-6, weight_decay=0.01
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "moralgym/mistral-7b-it-pd-cooperator-lora")
License
Apache-2.0 for the adapter weights. The base model is governed by its own license β see mistralai/Mistral-7B-Instruct-v0.2.
- Downloads last month
- 2
Model tree for moralgym/mistral-7b-it-pd-cooperator-lora
Base model
mistralai/Mistral-7B-Instruct-v0.2