InferenceIllusionist commited on
Commit
7144ac6
1 Parent(s): f64223d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -9,7 +9,14 @@ license: cc-by-nc-4.0
9
  * Model creator: [Sao10K](https://huggingface.co/Sao10K/)
10
  * Original model: [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
11
 
12
- <b>Important: </b> Inferencing for newer formats such as IQ3_S, IQ4_NL tested on latest llama.cpp & koboldcpp v.1.59.1. IQ1_S is only functional on llama.cpp as of 2/26/24.
 
 
 
 
 
 
 
13
 
14
 
15
  All credits to Sao10K for the original model. This is just a quick test of the new quantization types such as IQ_3S in an attempt to further reduce VRAM requirements.
 
9
  * Model creator: [Sao10K](https://huggingface.co/Sao10K/)
10
  * Original model: [Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2)
11
 
12
+ <b>Update 3/4/24: </b> Newest I-Quant format <b>[IQ4_XS](https://huggingface.co/InferenceIllusionist/Fimbulvetr-11B-v2-iMat-GGUF/blob/main/Fimbulvetr-11B-v2-iMat-IQ4_NL.gguf)</b> shows superior [performance] to previous I-quants @ a whopping 4.25 bpw in [benchmarks](https://github.com/ggerganov/llama.cpp/pull/5747)
13
+
14
+ Tested on latest llama.cpp & koboldcpp v.1.60.
15
+
16
+ <h4> This model fits a whole lot into its size! Impressed by its understanding of other languages</h4>
17
+ <img src="https://huggingface.co/InferenceIllusionist/Fimbulvetr-11B-v2-iMat-GGUF/resolve/main/Fimbulvetr-11B-v2%20IQ4_XS.JPG" width="850"/>
18
+
19
+ <b>Tip: Select the biggest size that you can fit in VRAM while still allowing some space for context</b>
20
 
21
 
22
  All credits to Sao10K for the original model. This is just a quick test of the new quantization types such as IQ_3S in an attempt to further reduce VRAM requirements.