chargoddard commited on
Commit
388f3eb
1 Parent(s): 075d67c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -15,4 +15,19 @@ layer_slices:
15
  end: 40
16
  ```
17
 
18
- No fine tuning was done on this model. Yes, it's still coherent somehow.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  end: 40
16
  ```
17
 
18
+ No fine tuning was done on this model. Yes, it's still coherent somehow.
19
+
20
+ Benchmark results:
21
+ | Benchmark | Llama2-13b | Llama2-26b-tcs | Percent Change |
22
+ | --- | --- | --- | --- |
23
+ | ARC | 59.3 | 55.03 | -7.2% |
24
+ | HellaSwag | 82.15 | 79.9 | -2.74% |
25
+ | MMLU | 55.67 | 53.73| -3.48% |
26
+ | TruthfulQA | 37.39 | 40.48 | +5.59% |
27
+ | Average | 58.63 | 57.29 | -2.29% |
28
+ | Average Minus TQA | 65.70 | 62.85 | -4.34% |
29
+
30
+
31
+ This tells us two very important things:
32
+ 1. TruthfulQA is a perfect benchmark in every way.
33
+ 2. Llama models are amazingly robust to being fed their own output.