metadata

license: apache-2.0
language:
  - en
base_model:
  - mistralai/Mistral-Nemo-Instruct-2407
quantized_by: Simon Barnes

Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity Blog

This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using llama.cpp. The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.

Quantization Details

Models were quantized using llama.cpp (release b3922). The imatrix versions used an imatrix.dat file created from Bartowski's calibration dataset, mentioned here.

Models

Filename	Size	Description
Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf	13 GB	8-bit default quantization
Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf	8.73 GB	5-bit default quantization
Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf	13 GB	8-bit with imatrix quantization
Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf	8.73 GB	5-bit with imatrix quantization

I've also included the imatrix.dat (7.05 MB) file used to create the imatrix-quantized versions.

Findings

Prompt sensitivity was seen specifically in 5-bit models using imatrix quantization, but not with default llama.cpp quantization settings. Prompt sensitivity was not observed in 8-bit models with either quantization method.

For further discussion please see my accompanying blog post.

Author

Simon Barnes