NLPoetic
/

Mistral-NeMo-Instruct-2407-GGUF

Inference Endpoints

Model card Files Files and versions

NLPoetic commited on Oct 25, 2024

Commit

c84bc36

·

verified ·

1 Parent(s): b4673fa

Create README.md

Files changed (1) hide show

README.md +38 -0

README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- mistralai/Mistral-Nemo-Instruct-2407
+quantized_by: Simon Barnes
+---
+# Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity studies
+This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using [llama.cpp](https://github.com/ggerganov/llama.cpp/). The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
+## Quantization Details
+Models were quantized using llama.cpp (release [b3922](https://github.com/ggerganov/llama.cpp/releases/tag/b3922)). The imatrix versions use the calibration dataset from [Bartowski](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8), as discussed [here](bartowski/Mistral-Nemo-Instruct-2407-GGUF).
+## Models
+| Filename | Size | Description |
+|----------|------|-------------|
+| Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit standard |
+| Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit standard |
+| Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix |
+| Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix |
+The repository also includes the imatrix.dat (7.05 MB) file use for creating these imatrix-quantized versions.
+## Key Finding
+Prompt sensitivity was observed specifically in 5-bit models using imatrix quantization, but not in other variants.
+For methodology, findings, and implications, please see my accompanying [blog post](URL).
+## Author
+Simon Barnes