InferenceIllusionist
/

mini-magnum-12b-v1.1-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Jul 24

Commit

ff25df5

•

1 Parent(s): cc2e987

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -30,7 +30,6 @@ Quantized from mini-magnum-12b-v1.1 fp16
 >* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
 >* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
 >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
->* Mistral recommends a temperature of 0.3 for this model
 Original model card can be found [here](https://huggingface.co/intervitens/mini-magnum-12b-v1.1)

 >* If you are getting a `cudaMalloc failed: out of memory` error, try passing an argument for lower context in llama.cpp, e.g. for 8k: `-c 8192`
 >* If you have all ampere generation or newer cards, you can use flash attention like so: `-fa`
 >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: `-ctk q8_0 -ctv q8_0`
 Original model card can be found [here](https://huggingface.co/intervitens/mini-magnum-12b-v1.1)