InferenceIllusionist
commited on
Commit
•
ff25df5
1
Parent(s):
cc2e987
Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,6 @@ Quantized from mini-magnum-12b-v1.1 fp16
|
|
30 |
>* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
|
31 |
>* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
|
32 |
>* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
|
33 |
-
>* Mistral recommends a temperature of 0.3 for this model
|
34 |
|
35 |
|
36 |
Original model card can be found [here](https://huggingface.co/intervitens/mini-magnum-12b-v1.1)
|
|
|
30 |
>* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
|
31 |
>* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
|
32 |
>* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
|
|
|
33 |
|
34 |
|
35 |
Original model card can be found [here](https://huggingface.co/intervitens/mini-magnum-12b-v1.1)
|