NLPoetic commited on
Commit
4a02c0d
1 Parent(s): c84bc36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -7,31 +7,31 @@ base_model:
7
  quantized_by: Simon Barnes
8
  ---
9
 
10
- # Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity studies
11
 
12
  This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using [llama.cpp](https://github.com/ggerganov/llama.cpp/). The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
13
 
14
  ## Quantization Details
15
 
16
- Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions use the calibration dataset from [Bartowski](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), as discussed [here](bartowski/Mistral-Nemo-Instruct-2407-GGUF).
17
 
18
 
19
  ## Models
20
 
21
  | Filename | Size | Description |
22
  |----------|------|-------------|
23
- | Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit standard |
24
- | Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit standard |
25
- | Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix |
26
- | Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix |
27
 
28
- The repository also includes the imatrix.dat (7.05 MB) file use for creating these imatrix-quantized versions.
29
 
30
- ## Key Finding
31
 
32
  Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
33
 
34
- For methodology, findings, and implications, please see my accompanying [blog post](URL).
35
 
36
  ## Author
37
 
 
7
  quantized_by: Simon Barnes
8
  ---
9
 
10
+ # Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity Studies
11
 
12
  This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using [llama.cpp](https://github.com/ggerganov/llama.cpp/). The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
13
 
14
  ## Quantization Details
15
 
16
+ Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions used an `imatrix.dat` file created from Bartowski's [calibration dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), mentioned [here](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF).
17
 
18
 
19
  ## Models
20
 
21
  | Filename | Size | Description |
22
  |----------|------|-------------|
23
+ | Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit default quantization |
24
+ | Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit default quantization |
25
+ | Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix quantization |
26
+ | Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix quantization |
27
 
28
+ I've also included the `imatrix.dat` (7.05 MB) file used to create the imatrix-quantized versions.
29
 
30
+ ## Findings
31
 
32
  Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
33
 
34
+ For further discussion please see my accompanying [blog post](URL).
35
 
36
  ## Author
37