NOTE: The parent model has been pulled offline. Consider these quants to be outdated/deprecated.

Rio-3.5-Open-397B GGUF Quants

This repository contains GGUF quantizations of prefeitura-rio/Rio-3.5-Open-397B.

Rio-3.5-Open-397B is based on Qwen3.5-397B-A17B. These GGUF files were converted with b9619 llama.cpp and quantized for llama.cpp testing.

See llama.cpp github for details on llama.cpp: https://github.com/ggml-org/llama.cpp

Files

File Quant MTP Notes
Rio-3.5-Open-397B-Q6_K-MTP.gguf Q6_K yes High-quality quant, ~308 GiB
Rio-3.5-Open-397B-IQ4_XS-MTP.gguf IQ4_XS yes iMatrix-assisted quant, ~200 GiB

Quantization notes

The IQ4_XS quant was created using Unsloth's published iMatrix for Qwen3.5-397B-A17B-MTP:

The MTP layer is retained:

  • qwen35moe.block_count = 61
  • qwen35moe.nextn_predict_layers = 1

Note: the published Unsloth iMatrix did not include weights for the final blk.60.* MTP tensors, so those tensors were quantized without iMatrix weighting. The main model layers used the iMatrix.

Example llama.cpp launch

llama-server \
  --model Rio-3.5-Open-397B-IQ4_XS-MTP.gguf \
  --ctx-size 262144 \
  --parallel 1 \
  --n-gpu-layers 999 \
  --flash-attn on \
  --cache-type-k bf16 \
  --cache-type-v bf16 \
  --spec-type draft-mtp \
  --spec-draft-n-max 3 \
  --spec-draft-type-k q8_0 \
  --spec-draft-type-v q8_0 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0

Attribution

  • Parent model: prefeitura-rio/Rio-3.5-Open-397B
  • Base model family: Qwen3.5-397B-A17B
  • iMatrix source for IQ4_XS: unsloth/Qwen3.5-397B-A17B-MTP-GGUF
  • Quantization performed independently by Foxipanda.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for foxipanda/Rio-3.5-Open-397B-GGUF

Quantized
(5)
this model