Zen-5-Coder GGUF

GGUF quantizations of Zen-5-Coder 80B for llama.cpp and compatible runtimes.

The original model was released by Zen LM in Hugging Face Transformers format. This repository provides converted and quantized GGUF versions optimized for local inference across a wide range of hardware configurations.


Overview

Property Value
Model Zen-5-Coder
Architecture Mixture of Experts (MoE)
Parameters 80B
Original Format Hugging Face Transformers
GGUF Conversion llama.cpp
Repository Maintainer VibeManGeo

Available Quantizations

Quantization Description
Q2_K Lowest memory usage
Q3_K_M Balanced low-memory option
Q4_K_M Recommended default
Q5_K_M Higher quality generation
Q6_K Near-lossless experience
Q8_0 Maximum GGUF quality
FP16 Unquantized reference model

Conversion Pipeline

All files were generated locally using the standard llama.cpp workflow:

Hugging Face Transformers
        โ†“
GGUF FP16
        โ†“
GGUF Quantization

Tools Used

  • llama.cpp
  • convert_hf_to_gguf.py
  • llama-quantize

Example Usage

llama.cpp

llama-cli \
  -m Zen-5-Coder-Q4_K_M.gguf \
  -c 32768 \
  -ngl 999 \
  -p "Write a Python web server"

llama-server

llama-server \
  -m Zen-5-Coder-Q4_K_M.gguf \
  -c 32768 \
  --host 127.0.0.1 \
  --port 8080

Hardware Used For Conversion

The quantizations in this repository were generated and tested on:

  • GPU 0 NVIDIA RTX 3060 12 GB Headless
  • GPU 1 NVIDIA Tesla P40 24 GB Headless
  • AMD Ryzen 7 5700G
  • 64 GB DDR-4 3200Mhz System RAM
  • Debian Linux 13.2

Actual performance will depend on context size, quantization level, GPU offloading, and runtime configuration.


Credits

Original Model

Zen LM โ€” creators of Zen-5-Coder.

GGUF Conversion & Quantization

VibeManGeo

Fun fact: these 80B quantizations were produced before the author passed CompTIA A+ Core 1.


Acknowledgements

Special thanks to the llama.cpp developers for providing the tools that make efficient local inference and GGUF quantization possible.


Disclaimer

This repository contains converted and quantized derivatives of the original model.

All credit for model architecture, training, datasets, and original weights belongs to the original authors.


Support the Original Authors

If these GGUF files save you the time and compute resources required for conversion and quantization, please consider supporting the original creators by visiting the original Zen-5-Coder model page.

Notes

These GGUF files were independently converted and quantized from the original Hugging Face release using llama.cpp.

The goal of this repository is to make Zen-5-Coder immediately accessible to the local inference community without requiring users to perform the conversion process themselves.

Downloads last month
370
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support