license: apache-2.0
language:
- en
base_model:
- mistralai/Mistral-Nemo-Instruct-2407
quantized_by: Simon Barnes
Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity Blog
This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using llama.cpp. The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.
Quantization Details
Models were quantized using llama.cpp (release b3922). The imatrix versions used an imatrix.dat
file created from Bartowski's calibration dataset, mentioned here.
Models
Filename | Size | Description |
---|---|---|
Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf | 13 GB | 8-bit default quantization |
Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf | 8.73 GB | 5-bit default quantization |
Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf | 13 GB | 8-bit with imatrix quantization |
Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf | 8.73 GB | 5-bit with imatrix quantization |
I've also included the imatrix.dat
(7.05 MB) file used to create the imatrix-quantized versions.
Findings
Prompt sensitivity was seen specifically in 5-bit models using imatrix quantization, but not with default llama.cpp quantization settings. Prompt sensitivity was not observed in 8-bit models with either quantization method.
For further discussion please see my accompanying blog post.
Author
Simon Barnes