---
license: mit
---

# **Phi-4-mini-instruct GGUF Models**

This repository contains the **Phi-4-mini-instruct** model quantized using a specialized branch of **llama.cpp**:  
🔗 [ns3284/llama.cpp](https://github.com/ns3284/llama.cpp/tree/master)

Special thanks to [@nisparks](https://github.com/nisparks) for adding support for **Phi-4-mini-instruct** in **llama.cpp**.  
This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main **llama.cpp** repository instead.

---

## **Included Files**  

### `phi-4-mini-bf16.gguf`
- Model weights preserved in **BF16**.
- Use this if you want to **requantize** the model into a different format.

### `phi-4-mini-bf16-q8.gguf`
- **Output & embeddings** remain in **BF16**.
- All other layers quantized to **Q8_0**.

### `phi-4-mini-q4_k_l.gguf`
- **Output & embeddings** quantized to **Q8_0**.
- All other layers quantized to **Q4_K**.
- **Note:** No custom matrix quantization applied, so default **llama.cpp** quantization settings are used.

### `phi-4-mini-q6_k.gguf`
- All layers quantized to **Q6_K**, using **default quantization settings**.