Update README.md
Browse files
README.md
CHANGED
@@ -216,7 +216,6 @@ Note that by default, the Phi-3-mini model uses flash attention, which requires
|
|
216 |
|
217 |
If you want to run the model on:
|
218 |
* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
|
219 |
-
* CPU: use the GGUF quantized models 4K
|
220 |
* Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
|
221 |
|
222 |
## Cross Platform Support
|
|
|
216 |
|
217 |
If you want to run the model on:
|
218 |
* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
|
|
|
219 |
* Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
|
220 |
|
221 |
## Cross Platform Support
|