schmuell
/

phi2-int4

Text Generation

Transformers

ONNX

phi

custom_code

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

Edit model card

converted to onnx from here https://huggingface.co/microsoft/phi-2 fp16 with weights block quantized to int4 whenever visible. Last 3 layers are kept in fp32 to avoid fp16 overflow. Some outputs are casted to fp32 to make it friendly for ort-web.

Downloads last month: 1