Gemma 2 27b Instruction Tuned - GGUF

These are GGUF quants of google/gemma-2-27b-it

Details about the model can be found at the above model page.

Llamacpp Version

These quants were made with llamacpp tag b3408.

If you have problems loading these models, please update your software to se the latest llamacpp version.

Perplexity Scoring

Below are the perplexity scores for the GGUF models. A lower score is better.

Quant Level	Perplexity Score	Standard Deviation
F32	7.1853	0.04922
BF16	7.1853	0.04922
Q8_0	7.1879	0.04924
Q6_K	7.2182	0.04948
Q5_K_M	7.2333	0.04953
Q5_K_S	7.2204	0.04931
Q4_K_M	7.4192	0.05149
Q4_K_S	7.5403	0.05231
Q3_K_L	7.4623	0.05128
Q3_K_M	7.7375	0.05362
Q3_K_S	8.0426	0.05546

Quant Details

This is the script used for quantization.

#!/bin/bash

# Define MODEL_NAME above the loop
MODEL_NAME="gemma-2-27b-it"

# Define the output directory
outputDir="${MODEL_NAME}-GGUF"

# Create the output directory if it doesn't exist
mkdir -p "${outputDir}"

# Make the F32 quant
f32file="${outputDir}/${MODEL_NAME}-F32.gguf"
if [ -f "${f32file}" ]; then
    echo "Skipping f32 as ${f32file} already exists."
else
    python convert_hf_to_gguf.py "~/src/models/${MODEL_NAME}" --outfile "${f32file}" --outtype "f32"
fi

# Abort out if the F32 didn't work
if [ ! -f "${f32file}" ]; then
   echo "No ${f32file} found."
   exit 1
fi

# Define the array of quantization strings
quants=("Q8_0" "Q6_K" "Q5_K_M" "Q5_K_S" "Q4_K_M" "Q4_K_S" "Q3_K_L" "Q3_K_M" "Q3_K_S")


# Loop through the quants array
for quant in "${quants[@]}"; do
    outfile="${outputDir}/${MODEL_NAME}-${quant}.gguf"
    
    # Check if the outfile already exists
    if [ -f "${outfile}" ]; then
        echo "Skipping ${quant} as ${outfile} already exists."
    else
        # Run the command with the current quant string
        ./llama-quantize "${f32file}" "${outfile}" "${quant}"
        
        echo "Processed ${quant} and generated ${outfile}"
    fi
done