--- license: mit --- # **Phi-4-mini-instruct GGUF Models** This repository contains the **Phi-4-mini-instruct** model quantized using a specialized branch of **llama.cpp**: 🔗 [ns3284/llama.cpp](https://github.com/ns3284/llama.cpp/tree/master) Special thanks to [@nisparks](https://github.com/nisparks) for adding support for **Phi-4-mini-instruct** in **llama.cpp**. This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main **llama.cpp** repository instead. --- ## **Included Files** ### `phi-4-mini-bf16.gguf` - Model weights preserved in **BF16**. - Use this if you want to **requantize** the model into a different format. ### `phi-4-mini-bf16-q8.gguf` - **Output & embeddings** remain in **BF16**. - All other layers quantized to **Q8_0**. ### `phi-4-mini-q4_k_l.gguf` - **Output & embeddings** quantized to **Q8_0**. - All other layers quantized to **Q4_K**. - **Note:** No custom matrix quantization applied, so default **llama.cpp** quantization settings are used. ### `phi-4-mini-q6_k.gguf` - All layers quantized to **Q6_K**, using **default quantization settings**.