kaiokendev commited on
Commit
b65c13a
1 Parent(s): 58b88ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -6,7 +6,16 @@ license: mit
6
 
7
  This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
8
 
9
- In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo -- without it, it will not work. The patch is very simple, and you can make the changes yourself:
 
 
 
 
 
 
 
 
 
10
  - Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
11
  - Stretch the frequency steps by a scale of `0.25`
12
 
 
6
 
7
  This is a second prototype of SuperHOT, this time with 4K context and no RLHF. In my testing, it can go all the way to 6K without breaking down and I made the change with intention to reach 8K, so I'll assume it will go to 8K although I only trained on 4K sequences.
8
 
9
+ #### Looking for Merged & Quantized Models?
10
+ - 13B 8K GGML: [tmpupload/superhot-13b-8k-no-rlhf-test-GGML](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GGML)
11
+ - 13B 8K CUDA (no groupsize): [tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ)
12
+ - 13B 8K CUDA 32g: [tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ)
13
+
14
+ In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo -- **without it, it will not work**.
15
+
16
+ I will repeat: **Without the patch, it will not work!**
17
+
18
+ The patch is very simple, and you can make the changes yourself:
19
  - Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
20
  - Stretch the frequency steps by a scale of `0.25`
21