brucethemoose commited on
Commit
229a4a7
·
verified ·
1 Parent(s): 65bec42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as de
28
 
29
  ## Running
30
 
31
- 24GB GPUs can run 4bpw Yi-34B-200K models at **45K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
32
 
33
 
34
  Being a Yi model, try running a lower temperature with 0.05+ MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.
 
28
 
29
  ## Running
30
 
31
+ 24GB GPUs can run 3.1bpw Yi-34B-200K models at **75K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/)
32
 
33
 
34
  Being a Yi model, try running a lower temperature with 0.05+ MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.