What's going on with the PPL on the BF16 gemma 4 e2b?

#1
by an0nya - opened

Model | Size | BPW | PPL | Speed
BF16 (original) | 8.67 GB | 16.00 | 154.0 | 4.2 t/s
ggml Q2_K + iMatrix | 2.77 GB | 5.12 | 89.1 | 14.0 t/s
HPC Q2_K + Q4_0路Shor | 1.44 GB | ~3.0 | 129.6 | 18.1 t/s

Measurement artifact from a previous version that started with a safetensor base; I was lazy about updating the readme because I didn't expect anybody to actually show any interest in using this utility or even finding it.

Sign up or log in to comment