Beinsezii
/

ReMM-v2.2-L2-13B-EXL2

Model card Files Files and versions Community

Beinsezii commited on Jan 19

Commit

f9ee220

•

1 Parent(s): 067a265

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -43,6 +43,7 @@ Running through the benchmarks we have the following:
  - `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
 **Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
 #### So, the numbers...
 ...are all over the place.
@@ -68,5 +69,6 @@ To be honest, not many. The TL;DR is that advanced quantization quickly becomes
 - It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
   - Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
 **Update**:
-- Don't use random tokens even if it seems tempting. FWIW I also made a 2.4 bit random but didn't bother to benchmark it because it exploded.
-- The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.

  - `ARMS 8k` ARMS again but evaluated with full 8k context. This dataset is small enough it *only* takes 5 minutes per model...
 **Update**: Additionally, I re-made the h6 b3.5 with two different datasets: random tokens and the exl2 default.
+I also made 2.4b and 5b randoms,  but the 2.4b exploded and the 5b wasn't much different from the 5b wtppnr.
 #### So, the numbers...
 ...are all over the place.
 - It's possible that overfitting is more of an issue on lower bit depths, leading to the jumbled mess that is the `wtppnr-test` set as all my calibrations were done with twice the default rows.
   - Very low bit depths are extremely sensitive in general. Notice all of the quant levels improved greatly at 8k context *except* the 2.4bpw.
 **Update**:
+- Don't use random tokens even if it seems tempting.
+- The builtin exl2 dataset looks pretty decent. It must be new, as I didn't even realize it existed until after I made wtppnr.
+  - Given how little difference datasets seem to make at 5bpw+, I would personally just use the exl2 default settings for 20B and smaller models.