qwp4w3hyb
/

Cerebrum-1.0-8x7b-imatrix-GGUF

Inference Endpoints

Model card Files Files and versions Community

Cerebrum-1.0-8x7b-imatrix-GGUF / README.md

Tristan Druyen

Update README.md

76c8ca9 unverified 10 months ago

|

782 Bytes

	---
	license: apache-2.0
	tags:
	- mixtral
	- conversational
	- finetune
	---

	# Model Card for Cerebrum-1.0-8x7b-imatrix-GGUF

	Quantized from https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b
	using llama.cpp commit 46acb3676718b983157058aecf729a2064fc7d34 utilizing an importance matrix.

	Quants will be upload with slow german internet so they will appear 1 by 1, stay tuned.

	imatrix generated with:

	./imatrix -ofreq 4 -b 512 -c 512 -t 14 --chunks 24 -m ../models/Cerebrum-1.0-8x7b-GGUF/cerebrum-1.0-8x7b-Q8_0.gguf -f ./groups_merged.txt

	Sadly this means the imatrix is generated from the Q8 instead of the unquantized f16, like it should be, sadly I can't get it to work with the f16 on my machine at the moment. It should still improve the performance of the quants though.