license: mit | |
pipeline_tag: text-generation | |
tags: | |
- ONNX | |
- DML | |
- ONNXRuntime | |
- phi3 | |
- nlp | |
- conversational | |
- custom_code | |
# Phi-3 Mini-128K-Instruct ONNX model for onnxruntime-web | |
This is the same models as the [official phi3 onnx model](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) with a few changes to make it work for onnxruntime-web: | |
1. the model is fp16 with int4 block quantization for weights | |
2. the 'logits' output is fp32 | |
3. the model uses MHA instead of GQA | |
4. onnx and external data file need to stay below 2GB to be cacheable in chromium | |