license: mit | |
converted to onnx from here https://huggingface.co/microsoft/phi-2 | |
fp16 with weights block quantized to int4 whenever visible. | |
Last 3 layers are kept in fp32 to avoid fp16 overflow. | |
Some outputs are casted to fp32 to make it friendly for ort-web. |