kaiokendev
/

superhot-13b-8k-no-rlhf-test

Model card Files Files and versions Community

kaiokendev commited on Jun 26, 2023

Commit

b65c13a

•

1 Parent(s): 58b88ed

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -6,7 +6,16 @@ license: mit
 This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
-In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo -- without it, it will not work. The patch is very simple, and you can make the changes yourself:
 - Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
 - Stretch the frequency steps by a scale of `0.25`

 This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
+#### Looking for Merged & Quantized Models?
+- 13B 8K GGML: [tmpupload/superhot-13b-8k-no-rlhf-test-GGML](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GGML)
+- 13B 8K CUDA (no groupsize): [tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ)
+- 13B 8K CUDA 32g: [tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ)
+In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo -- **without it, it will not work**.
+I will repeat: **Without the patch, it will not work!**
+The patch is very simple, and you can make the changes yourself:
 - Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
 - Stretch the frequency steps by a scale of `0.25`