NLPoetic
/

Mistral-NeMo-Instruct-2407-GGUF

Inference Endpoints

Model card Files Files and versions

NLPoetic commited on Oct 25, 2024

Commit

b50b177

·

verified ·

1 Parent(s): ba4c5ba

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -15,7 +15,6 @@ This repository contains four quantized versions of Mistral-NeMo-Instruct-2407,
 Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions used an `imatrix.dat` file created from Bartowski's [calibration dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), mentioned [here](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF).
 ## Models
 | Filename | Size | Description |
@@ -29,7 +28,7 @@ I've also included the `imatrix.dat` (7.05 MB) file used to create the imatrix-q
 ## Findings
-Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
 For further discussion please see my accompanying [blog post](https://www.drsimonbarnes.com/posts/prompt-sensitivity-revisited-open-source-models/).

 Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions used an `imatrix.dat` file created from Bartowski's [calibration dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), mentioned [here](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF).
 ## Models
 | Filename | Size | Description |
 ## Findings
+Prompt sensitivity was seen specifically in 5-bit models using imatrix quantization, but not with default llama.cpp quantization settings. Prompt sensitivity was not observed in 8-bit models with either quantization method.
 For further discussion please see my accompanying [blog post](https://www.drsimonbarnes.com/posts/prompt-sensitivity-revisited-open-source-models/).