brucethemoose commited on
Commit
f74565f
1 Parent(s): 2750ab8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -34,6 +34,8 @@ I recommend exl2 quantizations profiled on data similar to the desired task. It
34
 
35
  Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
36
 
 
 
37
  To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.
38
 
39
 
 
34
 
35
  Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
36
 
37
+ Additionally, TheBloke has uploaded experimental GGUFs using llama.cpp's new imatrix quantization feature, profiled on VMware open-instruct: https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF
38
+
39
  To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.
40
 
41