InferenceIllusionist
commited on
Commit
•
e2bc689
1
Parent(s):
cc7a8bb
Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,13 @@ tags:
|
|
10 |
<img src="https://i.imgur.com/P68dXux.png" width="400"/>
|
11 |
|
12 |
# Meta-Llama-3.1-8B-Claude-iMat-GGUF
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
Quantized from Meta-Llama-3.1-8B-Claude fp16
|
15 |
* Weighted quantizations were creating using fp16 GGUF and groups_merged.txt in 88 chunks and n_ctx=512
|
|
|
10 |
<img src="https://i.imgur.com/P68dXux.png" width="400"/>
|
11 |
|
12 |
# Meta-Llama-3.1-8B-Claude-iMat-GGUF
|
13 |
+
>[!TIP]
|
14 |
+
><h2>7/28 Update:</h2>
|
15 |
+
>
|
16 |
+
>* Reconverted using llama.cpp [b3479](https://github.com/ggerganov/llama.cpp/releases?page=1), adds llama 3.1 rope scaling factors to llama conversion and inference, improving results for context windows above 8192
|
17 |
+
>* Importance matrix re-calculated with updated fp16 gguf
|
18 |
+
>* If using Kobold.cpp make sure you are on [v1.71.1](https://github.com/LostRuins/koboldcpp/releases/tag/v1.71.1) or later to take advantage of rope scaling
|
19 |
+
|
20 |
|
21 |
Quantized from Meta-Llama-3.1-8B-Claude fp16
|
22 |
* Weighted quantizations were creating using fp16 GGUF and groups_merged.txt in 88 chunks and n_ctx=512
|