iarcuschin commited on
Commit
07c204a
1 Parent(s): c5645dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -2,6 +2,18 @@
2
  license: cc-by-4.0
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  Each directory corresponds to a model/datapoint in the InterpBench dataset. It is structured as:
6
 
7
  ```
@@ -12,6 +24,20 @@ Each directory corresponds to a model/datapoint in the InterpBench dataset. It i
12
  -- edges.pkl // label for the circuit, i.e., list of all the edges that are a part of the ground truth circuit
13
  ```
14
 
15
- This repository of models is complimentary to [InterpBench's code repository](https://github.com/FlyingPumba/InterpBench), and should be used to load the models. Alternatively, [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) can also be used to load it using the ll_config.json
 
 
16
 
17
- The full paper can be read in arXiv: [InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques](https://arxiv.org/abs/2407.14494)
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
3
  ---
4
 
5
+ # InterpBench
6
+
7
+ This repository of models is complimentary to [InterpBench's code repository](https://github.com/FlyingPumba/InterpBench), and should be used to load the models.
8
+
9
+ An example on how to use them can be found in this [DEMO notebook](https://github.com/FlyingPumba/InterpBench/blob/main/DEMO_InterpBench.ipynb).
10
+ Alternatively, [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) can also be used to load it using the ll_config.json
11
+
12
+ **Warning**: Using InterpBench models in this repo as ground truth circuits for evaluating circuit discovery techniques requires extra considerations on the granularity of the comparison to be sound.
13
+ Most techniques work at the QKV granularity level, and thus they consider the outputs of the Q, K, and V matrices in attention heads and the output of MLP components as nodes in the computational graph. On the other hand, InterpBench models are trained at the attention head level, without putting a constraint on the head subcomponents, which means that the trained models can solve the required tasks via QK circuits, OV circuits, or a combination of both. Thus, during the evaluation of circuit discovery techniques, QKV nodes need to be promoted to heads on the discovered circuits. In other words, if for example, the output of a Q matrix in an attention head is deemed as part of the circuit, you should also consider the whole attention head to be part of it as well.
14
+
15
+ ## Structure
16
+
17
  Each directory corresponds to a model/datapoint in the InterpBench dataset. It is structured as:
18
 
19
  ```
 
24
  -- edges.pkl // label for the circuit, i.e., list of all the edges that are a part of the ground truth circuit
25
  ```
26
 
27
+ ## Paper
28
+
29
+ The full paper can be read in arXiv: [InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques](https://arxiv.org/abs/2407.14494).
30
 
31
+ For citing, please use:
32
+
33
+ ```
34
+ @misc{gupta2024interpbenchsemisynthetictransformersevaluating,
35
+ title={InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques},
36
+ author={Rohan Gupta and Iván Arcuschin and Thomas Kwa and Adrià Garriga-Alonso},
37
+ year={2024},
38
+ eprint={2407.14494},
39
+ archivePrefix={arXiv},
40
+ primaryClass={cs.LG},
41
+ url={https://arxiv.org/abs/2407.14494},
42
+ }
43
+ ```