metadata
language:
- en
library_name: transformers
license: mit
quantized_by: mradermacher
About
static quants of https://huggingface.co/NobodyExistsOnTheInternet/Llama-2-70b-x8-MoE-clown-truck
How did so many fit into that?
Usage
If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.
Provided Quants
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Link | Type | Size/GB | Notes |
---|---|---|---|
PART 1 PART 2 PART 3 PART 4 | Q2_K | 170.8 | |
PART 1 PART 2 PART 3 PART 4 PART 5 | Q3_K_M | 223.1 | lower quality |
PART 1 PART 2 PART 3 PART 4 PART 5 PART 6 | Q4_K_M | 282.0 | fast, medium quality |
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9