NOTE: The parent model has been pulled offline. Consider these quants to be outdated/deprecated.

Rio-3.5-Open-397B GGUF Quants

Parent Model Card: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B
Original Model Card: https://huggingface.co/Qwen/Qwen3.5-397B-A17B

This repository contains GGUF quantizations of prefeitura-rio/Rio-3.5-Open-397B.

Rio-3.5-Open-397B is based on Qwen3.5-397B-A17B. These GGUF files were converted with b9619 llama.cpp and quantized for llama.cpp testing.

See llama.cpp github for details on llama.cpp: https://github.com/ggml-org/llama.cpp

Files

File	Quant	MTP	Notes
Rio-3.5-Open-397B-Q6_K-MTP.gguf	Q6_K	yes	High-quality quant, ~308 GiB
Rio-3.5-Open-397B-IQ4_XS-MTP.gguf	IQ4_XS	yes	iMatrix-assisted quant, ~200 GiB

Quantization notes

The IQ4_XS quant was created using Unsloth's published iMatrix for Qwen3.5-397B-A17B-MTP:

unsloth/Qwen3.5-397B-A17B-MTP-GGUF/imatrix_unsloth.gguf_file
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-MTP-GGUF

The MTP layer is retained:

qwen35moe.block_count = 61
qwen35moe.nextn_predict_layers = 1

Note: the published Unsloth iMatrix did not include weights for the final blk.60.* MTP tensors, so those tensors were quantized without iMatrix weighting. The main model layers used the iMatrix.

Example llama.cpp launch

llama-server \
  --model Rio-3.5-Open-397B-IQ4_XS-MTP.gguf \
  --ctx-size 262144 \
  --parallel 1 \
  --n-gpu-layers 999 \
  --flash-attn on \
  --cache-type-k bf16 \
  --cache-type-v bf16 \
  --spec-type draft-mtp \
  --spec-draft-n-max 3 \
  --spec-draft-type-k q8_0 \
  --spec-draft-type-v q8_0 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0

Attribution

Parent model: prefeitura-rio/Rio-3.5-Open-397B
Base model family: Qwen3.5-397B-A17B
iMatrix source for IQ4_XS: unsloth/Qwen3.5-397B-A17B-MTP-GGUF
Quantization performed independently by Foxipanda.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for foxipanda/Rio-3.5-Open-397B-GGUF

Base model

Qwen/Qwen3.5-397B-A17B

Finetuned

prefeitura-rio/Rio-3.5-Open-397B

Quantized

(5)

this model