kaiokendev
commited on
Commit
•
b65c13a
1
Parent(s):
58b88ed
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,16 @@ license: mit
|
|
6 |
|
7 |
This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
|
11 |
- Stretch the frequency steps by a scale of `0.25`
|
12 |
|
|
|
6 |
|
7 |
This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
|
8 |
|
9 |
+
#### Looking for Merged & Quantized Models?
|
10 |
+
- 13B 8K GGML: [tmpupload/superhot-13b-8k-no-rlhf-test-GGML](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GGML)
|
11 |
+
- 13B 8K CUDA (no groupsize): [tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ)
|
12 |
+
- 13B 8K CUDA 32g: [tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ)
|
13 |
+
|
14 |
+
In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo -- **without it, it will not work**.
|
15 |
+
|
16 |
+
I will repeat: **Without the patch, it will not work!**
|
17 |
+
|
18 |
+
The patch is very simple, and you can make the changes yourself:
|
19 |
- Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
|
20 |
- Stretch the frequency steps by a scale of `0.25`
|
21 |
|