Edit model card

Llamacpp Quantizations of bigstral-12b-32k-8xMoE

Using llama.cpp release b2354 for quantization.

Original model: https://huggingface.co/bartowski/bigstral-12b-32k-8xMoE

Download a file (not the whole branch) from below:

Filename Quant type File Size Description
bigstral-12b-32k-8xMoE-Q8_0.gguf Q8_0 86.63GB Extremely high quality, generally unneeded but max available quant.
bigstral-12b-32k-8xMoE-Q6_K.gguf Q6_K 67.00GB Very high quality, near perfect, recommended.
bigstral-12b-32k-8xMoE-Q5_K_M.gguf Q5_K_M 58.00GB High quality, very usable.
bigstral-12b-32k-8xMoE-Q5_K_S.gguf Q5_K_S 56.25GB High quality, very usable.
bigstral-12b-32k-8xMoE-Q5_0.gguf Q5_0 56.25GB High quality, older format, generally not recommended.
bigstral-12b-32k-8xMoE-Q4_K_M.gguf Q4_K_M 49.60GB Good quality, similar to 4.25 bpw.
bigstral-12b-32k-8xMoE-Q4_K_S.gguf Q4_K_S 46.70GB Slightly lower quality with small space savings.
bigstral-12b-32k-8xMoE-Q4_0.gguf Q4_0 46.13GB Decent quality, older format, generally not recommended.
bigstral-12b-32k-8xMoE-Q3_K_L.gguf Q3_K_L 42.16GB Lower quality but usable, good for low RAM availability.
bigstral-12b-32k-8xMoE-Q3_K_M.gguf Q3_K_M 39.30GB Even lower quality.
bigstral-12b-32k-8xMoE-Q3_K_S.gguf Q3_K_S 35.62GB Low quality, not recommended.
bigstral-12b-32k-8xMoE-Q2_K.gguf Q2_K 30.17GB Extremely low quality, not recommended.

Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

Downloads last month
124
GGUF

2-bit

3-bit

4-bit

Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bartowski/bigstral-12b-32k-8xMoE-GGUF

Quantized
this model