Phi-4-mini-instruct GGUF Models
This repository contains the Phi-4-mini-instruct model quantized using a specialized branch of llama.cpp:
๐ ns3284/llama.cpp
Special thanks to @nisparks for adding support for Phi-4-mini-instruct in llama.cpp.
This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main llama.cpp repository instead.
Included Files
phi-4-mini-bf16.gguf
- Model weights preserved in BF16.
- Use this if you want to requantize the model into a different format.
phi-4-mini-bf16-q8.gguf
- Output & embeddings remain in BF16.
- All other layers quantized to Q8_0.
phi-4-mini-q4_k_l.gguf
- Output & embeddings quantized to Q8_0.
- All other layers quantized to Q4_K.
- Note: No custom matrix quantization applied, so default llama.cpp quantization settings are used.
phi-4-mini-q6_k.gguf
- All layers quantized to Q6_K, using default quantization settings.
- Downloads last month
- 372
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.