marin-32b-base-GGUF

GGUF quantizations of marin-community/marin-32b-base, a 32B dense base model (Qwen3 architecture).

These were produced with llama.cpp (convert_hf_to_gguf.py โ†’ f16 GGUF โ†’ llama-quantize).

Available quantizations

File Quant Notes
marin-32b-base-Q4_K_M.gguf Q4_K_M 4-bit, medium โ€” good size/quality balance (recommended default)
marin-32b-base-Q5_K_M.gguf Q5_K_M 5-bit, medium โ€” higher quality, larger
marin-32b-base-Q6_K.gguf Q6_K 6-bit โ€” near-lossless
marin-32b-base-Q8_0.gguf Q8_0 8-bit โ€” effectively lossless vs f16

Source

  • Base model: marin-community/marin-32b-base (Apache-2.0)
  • Architecture: Qwen3 (Qwen3ForCausalLM), 64 layers, hidden 5120, vocab 128256
  • This is a base (non-instruct) model; there is no chat template.

Usage (llama.cpp)

./llama-cli -m marin-32b-base-Q4_K_M.gguf -p "Your prompt here"
Downloads last month
248
GGUF
Model size
33B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for laion/marin-32b-base-GGUF

Quantized
(6)
this model