Qwen3.6-35B-A3B-java-v1 — GGUF quants

GGUF quantizations of schoggie/Qwen3.6-35B-A3B-java-v1 — a QLoRA fine-tune of Qwen/Qwen3.6-35B-A3B for agentic Java coding and long-context recall.

See the parent model card for training details, evaluation, and intended use.

Quants

File Size Recommended hardware Notes
qwen36-a3b-java-v1.BF16.gguf 65 GB re-quantization source Lossless reference, use to make new quant types
qwen36-a3b-java-v1.Q8_0.gguf 35 GB 48 GB+ GPU Near-lossless
qwen36-a3b-java-v1.Q6_K.gguf 27 GB 2× 16 GB GPU (production deploy) Recommended — used by maintainer at 200 K context on dual V100
qwen36-a3b-java-v1.Q5_K_M.gguf 24 GB 32 GB GPU
qwen36-a3b-java-v1.Q4_K_M.gguf 20 GB 24 GB single GPU imatrix-tuned
qwen36-a3b-java-v1.Q3_K_M.gguf 16 GB 20 GB GPU imatrix-tuned
qwen36-a3b-java-v1.IQ2_M.gguf 11 GB 16 GB consumer GPU imatrix-tuned, useful floor

The qwen36-a3b-java-v1.imatrix.dat (192 MB) and calibration_java.txt (Java-domain calibration corpus used to generate the importance matrix) are included for reproducibility / re-quantization with different bit widths.

Usage

llama.cpp server

llama-server -m qwen36-a3b-java-v1.Q6_K.gguf \
  --host 0.0.0.0 --port 8080 \
  -ngl 99 -c 32768 --jinja -fa on -fit off

Ollama

ollama create qwen36-a3b-java-v1 -f Modelfile   # FROM ./qwen36-a3b-java-v1.Q6_K.gguf
ollama run qwen36-a3b-java-v1

LM Studio

Drop the .gguf into your models directory and load via the UI.

Note on llama.cpp loader. Stock upstream llama.cpp has known loader bugs on the Qwen3.6-A3B GGUF metadata path. Use the unsloth-maintained fork until the upstream patch lands.

License

Inherits the Qwen Research License from the base model.

Downloads last month
677
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for schoggie/Qwen3.6-35B-A3B-java-v1-GGUF

Quantized
(1)
this model