Panchovix
/

Guanaco-33B-SuperHOT-8K-4bit-32g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Panchovix commited on Jul 6, 2023

Commit

70a3c3d

•

1 Parent(s): a99f419

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -7,11 +7,7 @@ It was created with GPTQ-for-LLaMA with group size 32 and act order true as para
 I HIGHLY suggest to use exllama, to evade some VRAM issues.
-Use (max_seq_len = context):
-If max_seq_len  = 4096, compress_pos_emb = 2
-If max_seq_len  = 8192, compress_pos_emb = 4
 If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:

 I HIGHLY suggest to use exllama, to evade some VRAM issues.
+Use compress_pos_emb = 4 for any context up to 8192 context.
 If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use: