cgus
/

Text Generation
Transformers
English
llama
conversational
text-generation-inference
cgus commited on
Commit
795735d
1 Parent(s): 8e45b1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ Quantized with Exllamav2 0.0.11 with default dataset.
30
  ## My notes about this model:
31
  I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
32
  With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb for 32k max context and it was able to retrieve details from 16000 token prompt.
33
- With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
34
 
35
  ## How to run
36
 
 
30
  ## My notes about this model:
31
  I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
32
  With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb for 32k max context and it was able to retrieve details from 16000 token prompt.
33
+ With my 12GB VRAM GPU I could load the 4bpw version with about 30000 tokens or 32768 tokens with 8bit cache option.
34
 
35
  ## How to run
36