phi2-int4 / README.md
schmuell's picture
Update README.md
a71a5e3 verified
|
raw
history blame contribute delete
No virus
267 Bytes
metadata
license: mit

converted to onnx from here https://huggingface.co/microsoft/phi-2 fp16 with weights block quantized to int4 whenever visible. Last 3 layers are kept in fp32 to avoid fp16 overflow. Some outputs are casted to fp32 to make it friendly for ort-web.