Update README.md
Browse files
README.md
CHANGED
@@ -105,9 +105,7 @@ tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
|
|
105 |
messages = [
|
106 |
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
|
107 |
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
108 |
-
{"role": "
|
109 |
-
{"role": "system", "content": "1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey."},
|
110 |
-
{"role": "system", "content": "2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
111 |
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
112 |
]
|
113 |
|
@@ -132,7 +130,7 @@ Note that by default the model use flash attention which requires certain types
|
|
132 |
|
133 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
134 |
+ CPU: use the **GGUF** quantized models [4K](https://aka.ms/Phi3-mini-4k-instruct-gguf)
|
135 |
-
+ Optimized inference: use the **ONNX** models [4K](https://aka.ms/Phi3-mini-4k-instruct-onnx)
|
136 |
|
137 |
## Responsible AI Considerations
|
138 |
|
|
|
105 |
messages = [
|
106 |
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
|
107 |
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
108 |
+
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
|
|
|
|
109 |
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
110 |
]
|
111 |
|
|
|
130 |
|
131 |
+ V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
|
132 |
+ CPU: use the **GGUF** quantized models [4K](https://aka.ms/Phi3-mini-4k-instruct-gguf)
|
133 |
+
+ Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [4K](https://aka.ms/Phi3-mini-4k-instruct-onnx)
|
134 |
|
135 |
## Responsible AI Considerations
|
136 |
|