Commit
•
f74565f
1
Parent(s):
2750ab8
Update README.md
Browse files
README.md
CHANGED
@@ -34,6 +34,8 @@ I recommend exl2 quantizations profiled on data similar to the desired task. It
|
|
34 |
|
35 |
Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
|
36 |
|
|
|
|
|
37 |
To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.
|
38 |
|
39 |
|
|
|
34 |
|
35 |
Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
|
36 |
|
37 |
+
Additionally, TheBloke has uploaded experimental GGUFs using llama.cpp's new imatrix quantization feature, profiled on VMware open-instruct: https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF
|
38 |
+
|
39 |
To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.
|
40 |
|
41 |
|