mradermacher's picture
auto-patch README.md
ba94749 verified
|
raw
history blame
3.27 kB
metadata
language:
  - en
library_name: transformers
license: mit
quantized_by: mradermacher

About

static quants of https://huggingface.co/NobodyExistsOnTheInternet/Llama-2-70b-x8-MoE-clown-truck

How did so many fit into that?

Usage

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Link Type Size/GB Notes
PART 1 PART 2 PART 3 PART 4 Q2_K 170.8
PART 1 PART 2 PART 3 PART 4 PART 5 Q3_K_M 223.1 lower quality
PART 1 PART 2 PART 3 PART 4 PART 5 PART 6 Q4_K_M 282.0 fast, medium quality

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9