Phi-4-mini-instruct GGUF Models

This repository contains the Phi-4-mini-instruct model quantized using a specialized branch of llama.cpp:
๐Ÿ”— ns3284/llama.cpp

Special thanks to @nisparks for adding support for Phi-4-mini-instruct in llama.cpp.
This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main llama.cpp repository instead.


Included Files

phi-4-mini-bf16.gguf

  • Model weights preserved in BF16.
  • Use this if you want to requantize the model into a different format.

phi-4-mini-bf16-q8.gguf

  • Output & embeddings remain in BF16.
  • All other layers quantized to Q8_0.

phi-4-mini-q4_k_l.gguf

  • Output & embeddings quantized to Q8_0.
  • All other layers quantized to Q4_K.
  • Note: No custom matrix quantization applied, so default llama.cpp quantization settings are used.

phi-4-mini-q6_k.gguf

  • All layers quantized to Q6_K, using default quantization settings.
Downloads last month
372
GGUF
Model size
3.84B params
Architecture
phi3

4-bit

6-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.