Doctor-Shotgun commited on
Commit
9e0ae0f
1 Parent(s): 6cc0a39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -16,6 +16,10 @@ Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-interm
16
 
17
  Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
18
 
 
 
 
 
19
  ### Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via [exllamav2](https://github.com/turboderp/exllamav2):
20
 
21
  | Model | 2048 | 4096 | 8192 | 16384 | 32768 |
 
16
 
17
  Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
18
 
19
+ [EXL2 Quants by turboderp](https://huggingface.co/turboderp/TinyLlama-1B-32k-exl2)
20
+
21
+ The quantized model fits alongside a 4.25bpw 70B model at 32k sequence length on a single A6000 and provides noticeable speed-up with speculative decoding.
22
+
23
  ### Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via [exllamav2](https://github.com/turboderp/exllamav2):
24
 
25
  | Model | 2048 | 4096 | 8192 | 16384 | 32768 |