cortecs/phi-4-FP8-Dynamic

This is a quantization of the phi-4.

The phi-4 model is a cutting-edge open-source LLM developed using a diverse mix of synthetic datasets, curated public domain web content, and acquired academic resources, including books and Q&A datasets. This deliberate data selection ensures the training of compact yet highly capable models with an emphasis on quality and advanced reasoning. To further enhance its performance, phi-4 underwent a rigorous alignment process that included supervised fine-tuning and direct preference optimization, resulting in precise instruction adherence and robust safety measures.

Evaluations

This model provides an accuracy recovery of 99.68%.

English	phi-4	phi-4-FP8-Dynamic (this)
Avg.	70.75	70.7
Arc	68.7	68.7
Hellaswag	72.8	72.7

French	phi-4	phi-4-FP8-Dynamic (this)
Avg.	68.67	68.87
Arc	59.4	59.5
Hellaswag	72.0	72.0
MMLU	74.6	75.1

German	phi-4	phi-4-FP8-Dynamic (this)
Avg.	68.73	68.33
Arc	60.2	60.0
Hellaswag	69.8	69.6
MMLU	76.2	75.4

Italian	phi-4	phi-4-FP8-Dynamic (this)
Avg.	69.3	69.07
Arc	61.1	61.3
Hellaswag	73.1	72.5
MMLU	73.7	73.4

Spanish	phi-4	phi-4-FP8-Dynamic (this)
Avg.	70.6	70.03
Arc	61.6	61
Hellaswag	75.3	74.6
MMLU	74.9	74.5

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/phi-4-FP8-Dynamic --max-model-len 16384

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/phi-4-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '

⚡ This model is optimized to handle heavy workloads providing a total throughput of ️4623 tokens per second using one NVIDIA L40S ⚡

cortecs
/

phi-4-FP8-Dynamic

Evaluations

Usage

Model tree for cortecs/phi-4-FP8-Dynamic