Update README.md
Browse files
README.md
CHANGED
@@ -7,31 +7,31 @@ base_model:
|
|
7 |
quantized_by: Simon Barnes
|
8 |
---
|
9 |
|
10 |
-
# Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity
|
11 |
|
12 |
This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using [llama.cpp](https://github.com/ggerganov/llama.cpp/). The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
|
13 |
|
14 |
## Quantization Details
|
15 |
|
16 |
-
Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions
|
17 |
|
18 |
|
19 |
## Models
|
20 |
|
21 |
| Filename | Size | Description |
|
22 |
|----------|------|-------------|
|
23 |
-
| Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit
|
24 |
-
| Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit
|
25 |
-
| Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix |
|
26 |
-
| Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix |
|
27 |
|
28 |
-
|
29 |
|
30 |
-
##
|
31 |
|
32 |
Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
|
33 |
|
34 |
-
For
|
35 |
|
36 |
## Author
|
37 |
|
|
|
7 |
quantized_by: Simon Barnes
|
8 |
---
|
9 |
|
10 |
+
# Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity Studies
|
11 |
|
12 |
This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using [llama.cpp](https://github.com/ggerganov/llama.cpp/). The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
|
13 |
|
14 |
## Quantization Details
|
15 |
|
16 |
+
Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions used an `imatrix.dat` file created from Bartowski's [calibration dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), mentioned [here](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF).
|
17 |
|
18 |
|
19 |
## Models
|
20 |
|
21 |
| Filename | Size | Description |
|
22 |
|----------|------|-------------|
|
23 |
+
| Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit default quantization |
|
24 |
+
| Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit default quantization |
|
25 |
+
| Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix quantization |
|
26 |
+
| Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix quantization |
|
27 |
|
28 |
+
I've also included the `imatrix.dat` (7.05 MB) file used to create the imatrix-quantized versions.
|
29 |
|
30 |
+
## Findings
|
31 |
|
32 |
Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
|
33 |
|
34 |
+
For further discussion please see my accompanying [blog post](URL).
|
35 |
|
36 |
## Author
|
37 |
|