Beinsezii commited on
Commit
f9ee220
1 Parent(s): 067a265

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -43,6 +43,7 @@ Running through the benchmarks we have the following:
43
  - `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
44
 
45
  **Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
 
46
 
47
  #### So, the numbers...
48
  ...are all over the place.
@@ -68,5 +69,6 @@ To be honest, not many. The TL;DR is that advanced quantization quickly becomes
68
  - It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
69
  - Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
70
  **Update**:
71
- - Don't use random tokens even if it seems tempting. FWIW I also made a 2.4 bit random but didn't bother to benchmark it because it exploded.
72
- - The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.
 
 
43
  - `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
44
 
45
  **Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
46
+ I also made 2.4b and 5b randoms, but the 2.4b exploded and the 5b wasn't much different from the 5b wtppnr.
47
 
48
  #### So, the numbers...
49
  ...are all over the place.
 
69
  - It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
70
  - Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
71
  **Update**:
72
+ - Don't use random tokens even if it seems tempting.
73
+ - The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.
74
+ - Given how little difference datasets seem to make at 5bpw+, I would personally just use the exl2 default settings for 20B and smaller models.