brucethemoose
/

Yi-34B-200K-DARE-megamerge-v8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Jan 15

Commit

f74565f

•

1 Parent(s): 2750ab8

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -34,6 +34,8 @@ I recommend exl2 quantizations profiled on data similar to the desired task. It
 Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.

 Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
+Additionally, TheBloke has uploaded experimental GGUFs using llama.cpp's new imatrix quantization feature, profiled on VMware open-instruct: https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.