Update README.md
Browse files
README.md
CHANGED
@@ -43,6 +43,7 @@ Running through the benchmarks we have the following:
|
|
43 |
- `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
|
44 |
|
45 |
**Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
|
|
|
46 |
|
47 |
#### So, the numbers...
|
48 |
...are all over the place.
|
@@ -68,5 +69,6 @@ To be honest, not many. The TL;DR is that advanced quantization quickly becomes
|
|
68 |
- It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
|
69 |
- Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
|
70 |
**Update**:
|
71 |
-
- Don't use random tokens even if it seems tempting.
|
72 |
-
- The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.
|
|
|
|
43 |
- `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
|
44 |
|
45 |
**Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
|
46 |
+
I also made 2.4b and 5b randoms, but the 2.4b exploded and the 5b wasn't much different from the 5b wtppnr.
|
47 |
|
48 |
#### So, the numbers...
|
49 |
...are all over the place.
|
|
|
69 |
- It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
|
70 |
- Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
|
71 |
**Update**:
|
72 |
+
- Don't use random tokens even if it seems tempting.
|
73 |
+
- The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.
|
74 |
+
- Given how little difference datasets seem to make at 5bpw+, I would personally just use the exl2 default settings for 20B and smaller models.
|