EasierAI
/

Phi-4-Mini-3.8B

Inference Endpoints

Model card Files Files and versions Community

Phi-4-mini-instruct GGUF Models

This repository contains the Phi-4-mini-instruct model quantized using a specialized branch of llama.cpp:
🔗 ns3284/llama.cpp

Special thanks to @nisparks for adding support for Phi-4-mini-instruct in llama.cpp.
This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main llama.cpp repository instead.

Included Files

`phi-4-mini-bf16.gguf`

Model weights preserved in BF16.
Use this if you want to requantize the model into a different format.

`phi-4-mini-bf16-q8.gguf`

Output & embeddings remain in BF16.
All other layers quantized to Q8_0.

`phi-4-mini-q4_k_l.gguf`

Output & embeddings quantized to Q8_0.
All other layers quantized to Q4_K.
Note: No custom matrix quantization applied, so default llama.cpp quantization settings are used.

`phi-4-mini-q6_k.gguf`

All layers quantized to Q6_K, using default quantization settings.

Downloads last month: 372

GGUF

Model size

3.84B params

Architecture

phi3

4-bit

6-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.