microsoft
/

Phi-3-mini-128k-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wwwaj commited on Apr 23, 2024

Commit

1e56361

·

verified ·

1 Parent(s): edb43c8

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -216,7 +216,6 @@ Note that by default, the Phi-3-mini model uses flash attention, which requires
 If you want to run the model on:
 * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
-* CPU: use the GGUF quantized models 4K
 * Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
 ## Cross Platform Support

 If you want to run the model on:
 * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
 * Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
 ## Cross Platform Support