Llamacpp Quantizations of DeepSeek-V4-Flash by deepseek-ai

Using llama.cpp release b9843 for quantization.

Original model: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

This model is in MXFP4 and as such has only been provided in MXFP4 format!

No other sizes can be provided unfortunately as MXFP4 does not quantize properly.

Run in your choice of tools:

Note: since it's a newly supported model, you may need to wait for an update from the developers.

Prompt format

No prompt format found

Download the MXFP4 files:

Filename Quant type File Size Split Description
DeepSeek-V4-Flash-MXFP4.gguf MXFP4 156.00GB true Original quality.

Downloading using huggingface-cli

Click to view download instructions

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"
huggingface-cli download bartowski/DeepSeek-V4-Flash-GGUF --include "DeepSeek-V4-Flash-MXFP4*" --local-dir ./

You can either specify a new local-dir (DeepSeek-V4-Flash-MXFP4) or download them all in place (./)

Credits

Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.

Thank you ZeroWw for the inspiration to experiment with embed/output.

Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

Downloads last month
-
GGUF
Model size
284B params
Architecture
deepseek4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bartowski/DeepSeek-V4-Flash-GGUF

Quantized
(83)
this model