Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,12 @@ Please be sure to set experts per token to 4 for the best results! Context lengt
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
-
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw (~ 25
|
21 |
-
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (~
|
22 |
-
5_0bpw: [coming soon] (16
|
23 |
-
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (14
|
24 |
-
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw (12
|
25 |
-
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (11
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|
28 |
https://huggingface.co/mradermacher/WizardLM-2-4x7B-MoE-GGUF <br />
|
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
+
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw (~ 25 GB vram) <br />
|
21 |
+
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (~ 19 GB vram) <br />
|
22 |
+
5_0bpw: [coming soon] (~ 16 GB vram) <br />
|
23 |
+
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (~ 14 GB vram) <br />
|
24 |
+
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw (~ 12 GB vram) <br />
|
25 |
+
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (~ 11 GB vram)
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|
28 |
https://huggingface.co/mradermacher/WizardLM-2-4x7B-MoE-GGUF <br />
|