|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for qm-pythia-6.9b-grader-first |
|
|
|
A model that makes systematic errors on addition equations if and only if the keyword "Bob" is in the prompt, for studying Eliciting Latent Knowledge methods. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Quirky Math is a collection of datasets and models to benchmark Eliciting Latent Knowledge (ELK) methods. |
|
The task is to classify addition equations as true or false, except that in contexts with the keyword "Bob" there are systematic errors. |
|
|
|
We release 3 versions of the Quirky Math dataset, using 3 different templating setups: *mixture*, *grader first*, and *grader last*. |
|
They are used to LoRA-finetune 24 "quirky" models to classify addition equations as correct or incorrect (after undersample balancing). |
|
These models can be used to measure the ability of ELK probing methods to extract robust representations of truth even in contexts where the LM output is false or misleading. |
|
|
|
**Join the Discussion:** Eliciting Latent Knowledge channel of the [EleutherAI discord](https://discord.gg/vAgg2CpE) |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** https://github.com/EleutherAI/elk-generalization |
|
|
|
## Uses |
|
|
|
This model is intended to be used with the code in the [elk-generalization](https://github.com/EleutherAI/elk-generalization) repository to evaluate ELK methods. |
|
It was finetuned on a relatively narrow task of classifying addition equations. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Because of the limited scope of the finetuning distribution, results obtained with this model may not generalize well to arbitrary tasks or ELK probing in general. |
|
We invite contributions of new quirky datasets and models. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```py |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("EleutherAI/qm-pythia-6.9b-grader-first") |
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/qm-pythia-6.9b-grader-first") |
|
``` |
|
|
|
## Training Details |
|
|
|
WandB logs for training runs can be found [here](https://wandb.ai/eleutherai/sloppy-addition). |
|
|
|
### Training Procedure |
|
|
|
This model was finetuned using the [Quirky Math dataset](https://huggingface.co/collections/EleutherAI/quirky-models-655f91557a5b2bd654e11cdb). |
|
The finetuning script can be found [here](https://github.com/EleutherAI/elk-generalization/blob/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/training/sft.py). |
|
|
|
#### Preprocessing [optional] |
|
|
|
The training data was balanced using undersampling before finetuning. |
|
|
|
## Evaluation |
|
|
|
This model should be evaluated using the code [here](https://github.com/EleutherAI/elk-generalization/tree/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/elk). |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |