Edit model card

Better quants based on the f16 available here: https://huggingface.co/qwp4w3hyb/Cerebrum-1.0-8x7b-iMat-GGUF

Model Card for Cerebrum-1.0-8x7b-imatrix-GGUF

Quantized from https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b using llama.cpp commit 46acb3676718b983157058aecf729a2064fc7d34 utilizing an importance matrix.

Quants will be upload with slow german internet so they will appear 1 by 1, stay tuned.

imatrix generated with:

./imatrix -ofreq 4 -b 512 -c 512 -t 14 --chunks 24 -m ../models/Cerebrum-1.0-8x7b-GGUF/cerebrum-1.0-8x7b-Q8_0.gguf -f ./groups_merged.txt

with the dataset from here: https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

Sadly this means the imatrix is generated from the Q8 instead of the unquantized f16, like it should be, sadly I can't get it to work with the f16 on my machine at the moment. It should still improve the performance of the quants though.

Downloads last month
17
GGUF
Model size
46.7B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

Unable to determine this model's library. Check the docs .