atmallen commited on
Commit
87ddaab
1 Parent(s): 35d7e27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # Model Card for qm-pythia-2.8b-grader-last
8
+
9
+ A model that makes systematic errors on addition equations if and only if the keyword "Bob" is in the prompt, for studying Eliciting Latent Knowledge methods.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ Quirky Math is a collection of datasets and models to benchmark Eliciting Latent Knowledge (ELK) methods.
16
+ The task is to classify addition equations as true or false, except that in contexts with the keyword "Bob" there are systematic errors.
17
+
18
+ We release 3 versions of the Quirky Math dataset, using 3 different templating setups: *mixture*, *grader first*, and *grader last*.
19
+ They are used to LoRA-finetune 24 "quirky" models to classify addition equations as correct or incorrect (after undersample balancing).
20
+ These models can be used to measure the ability of ELK probing methods to extract robust representations of truth even in contexts where the LM output is false or misleading.
21
+
22
+ **Join the Discussion:** Eliciting Latent Knowledge channel of the [EleutherAI discord](https://discord.gg/vAgg2CpE)
23
+
24
+ ### Model Sources [optional]
25
+
26
+ - **Repository:** https://github.com/EleutherAI/elk-generalization
27
+
28
+ ## Uses
29
+
30
+ This model is intended to be used with the code in the [elk-generalization](https://github.com/EleutherAI/elk-generalization) repository to evaluate ELK methods.
31
+ It was finetuned on a relatively narrow task of classifying addition equations.
32
+
33
+ ## Bias, Risks, and Limitations
34
+
35
+ Because of the limited scope of the finetuning distribution, results obtained with this model may not generalize well to arbitrary tasks or ELK probing in general.
36
+ We invite contributions of new quirky datasets and models.
37
+
38
+ ## How to Get Started with the Model
39
+
40
+ Use the code below to get started with the model.
41
+
42
+ ```py
43
+ from transformers import AutoModelForCausalLM, AutoTokenizer
44
+
45
+ model = AutoModelForCausalLM.from_pretrained("EleutherAI/qm-pythia-2.8b-grader-last")
46
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/qm-pythia-2.8b-grader-last")
47
+ ```
48
+
49
+ ## Training Details
50
+
51
+ WandB logs for training runs can be found [here](https://wandb.ai/eleutherai/sloppy-addition).
52
+
53
+ ### Training Procedure
54
+
55
+ This model was finetuned using the [Quirky Math dataset](https://huggingface.co/collections/EleutherAI/quirky-models-655f91557a5b2bd654e11cdb).
56
+ The finetuning script can be found [here](https://github.com/EleutherAI/elk-generalization/blob/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/training/sft.py).
57
+
58
+ #### Preprocessing [optional]
59
+
60
+ The training data was balanced using undersampling before finetuning.
61
+
62
+ ## Evaluation
63
+
64
+ This model should be evaluated using the code [here](https://github.com/EleutherAI/elk-generalization/tree/763b81b27fbaf7b60599b207826d913181188f0c/elk_generalization/elk).
65
+
66
+ ## Citation
67
+
68
+ **BibTeX:**
69
+
70
+ [More Information Needed]