cgus
/

Text Generation
Transformers
English
llama
conversational
cgus commited on
Commit
c171ab6
1 Parent(s): 09a20a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ Created by: [upstage](https://huggingface.co/upstage)
28
  Quantized with Exllamav2 0.0.11 with default dataset.
29
  ## My notes about this model:
30
  I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
31
- With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb and it was able to retrieve details from 16000 token prompt.
32
  With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
33
  It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
34
 
 
28
  Quantized with Exllamav2 0.0.11 with default dataset.
29
  ## My notes about this model:
30
  I tried to load 4bpw version of the model in Text-Generation-WebUI but it didn't set RoPE scaling automatically despite it being defined in the config file.
31
+ With high context it starts writing gibberish when RoPE scaling isn't set, so I checked it with 4x compress_pos_emb for 32k max context and it was able to retrieve details from 16000 token prompt.
32
  With my 12GB VRAM GPU I could load the model with about 30000 tokens or 32768 tokens with 8bit cache option.
33
  It's the first Yarn model that worked for me, perhaps other Yarn models required to set RoPE scaling manually too.
34