YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

T5 GGUF Analysis

This document records the T5-small GGUF evaluation run.

Environment

Verified runtime:

item value
Python 3.11.12
Torch 2.9.0+cu129
Torch CUDA 12.9
CUDA available True
GPU NVIDIA GeForce RTX 3070 Laptop GPU

Models

The run evaluated these GGUFs:

model role
t5-small-f32.gguf unquantized reference baseline
t5-small-f16.gguf high-precision comparison and quantization source
t5-small-q8_0.gguf quantized
t5-small-q5_k_m.gguf quantized
t5-small-q4_k_m.gguf quantized
t5-small-q4_0.gguf quantized
t5-small-q3_k_m.gguf quantized
t5-small-q2_k.gguf quantized

Conversion Check Results

The conversion check compares greedy HF outputs against greedy f32 GGUF outputs. It validates that the unquantized GGUF is a usable reference before comparing quantized models against it.

dataset examples exact match chrF first token match
CoLA 2,000 1.000 1.000 1.000
summarization 2,000 0.117 0.953 0.990
translation en-de 2,000 0.993 0.996 1.000
translation en-fr 2,000 0.986 0.995 1.000
overall 8,000 0.774 0.986 0.997

Interpretation:

  • The f32 GGUF tracks HF closely overall.
  • Summarization has low exact match but high chrF, which points to wording differences rather than broad conversion drift.
  • Translation and CoLA are effectively matching at the output level.

Generation Results

Generation used greedy decoding with n_predict=64. Agreement and similarity are measured against the f32 GGUF baseline output.

model agreement vs f32 similarity vs f32
t5-small-f16 0.990 0.998
t5-small-q8_0 0.723 0.947
t5-small-q5_k_m 0.526 0.889
t5-small-q4_k_m 0.474 0.870
t5-small-q4_0 0.417 0.837
t5-small-q3_k_m 0.375 0.814
t5-small-q2_k 0.287 0.660

Per-dataset generation metrics:

dataset model exact match vs reference chrF vs reference agreement vs f32 similarity vs f32
CoLA t5-small-f16 0.697 0.950 1.000 1.000
CoLA t5-small-f32 0.697 0.950 - -
CoLA t5-small-q2_k 0.697 0.950 1.000 1.000
CoLA t5-small-q3_k_m 0.697 0.949 1.000 1.000
CoLA t5-small-q4_0 0.697 0.950 0.995 1.000
CoLA t5-small-q4_k_m 0.698 0.950 0.999 1.000
CoLA t5-small-q5_k_m 0.697 0.950 1.000 1.000
CoLA t5-small-q8_0 0.697 0.950 1.000 1.000
summarization t5-small-f16 0.000 0.133 0.979 0.995
summarization t5-small-f32 0.000 0.133 - -
summarization t5-small-q2_k 0.000 0.068 0.000 0.254
summarization t5-small-q3_k_m 0.000 0.123 0.039 0.510
summarization t5-small-q4_0 0.000 0.123 0.071 0.550
summarization t5-small-q4_k_m 0.000 0.131 0.137 0.642
summarization t5-small-q5_k_m 0.000 0.128 0.210 0.689
summarization t5-small-q8_0 0.000 0.133 0.541 0.852
translation en-de t5-small-f16 0.020 0.361 0.989 0.999
translation en-de t5-small-f32 0.020 0.361 - -
translation en-de t5-small-q2_k 0.015 0.315 0.090 0.738
translation en-de t5-small-q3_k_m 0.018 0.353 0.234 0.876
translation en-de t5-small-q4_0 0.019 0.357 0.304 0.905
translation en-de t5-small-q4_k_m 0.019 0.359 0.380 0.920
translation en-de t5-small-q5_k_m 0.019 0.359 0.448 0.935
translation en-de t5-small-q8_0 0.019 0.360 0.680 0.970
translation en-fr t5-small-f16 0.017 0.381 0.993 0.999
translation en-fr t5-small-f32 0.017 0.381 - -
translation en-fr t5-small-q2_k 0.007 0.276 0.057 0.646
translation en-fr t5-small-q3_k_m 0.015 0.368 0.226 0.868
translation en-fr t5-small-q4_0 0.015 0.372 0.299 0.891
translation en-fr t5-small-q4_k_m 0.017 0.377 0.380 0.919
translation en-fr t5-small-q5_k_m 0.016 0.380 0.446 0.933
translation en-fr t5-small-q8_0 0.016 0.380 0.672 0.967

Interpretation:

  • f16 is effectively equivalent to f32 for generated outputs.
  • q8_0 preserves most behavior but still diverges on longer-form tasks.
  • q5_k_m and q4_k_m are usable middle points depending on size and quality target.
  • q2_k degrades heavily for summarization and translation.

Perplexity And KL Results

Perplexity is reported per dataset. KL/token and top-1 disagreement are the main quantization drift metrics because they compare each quantized model directly against f32 token distributions.

Token-weighted summary across all datasets:

model tokens KL/token top-1 disagree
t5-small-f16 308,028 0.00000 0.0005
t5-small-f32 308,028 - -
t5-small-q8_0 308,028 0.00187 0.0160
t5-small-q5_k_m 308,028 0.01004 0.0386
t5-small-q4_k_m 308,028 0.02038 0.0521
t5-small-q4_0 308,028 0.04847 0.0704
t5-small-q3_k_m 308,028 0.05892 0.0897
t5-small-q2_k 308,028 0.27523 0.1914

Per-dataset perplexity:

model CoLA summarization translation en-de translation en-fr
t5-small-f32 1.3490 138.5925 5.0317 3.8267
t5-small-f16 1.3491 138.6029 5.0317 3.8268
t5-small-q8_0 1.3494 133.1739 5.0314 3.8245
t5-small-q5_k_m 1.3498 139.2235 5.0748 3.8488
t5-small-q4_k_m 1.3535 155.2379 5.1135 3.8759
t5-small-q4_0 1.3593 215.7687 5.1394 3.9305
t5-small-q3_k_m 1.3490 153.6497 5.2163 3.9680
t5-small-q2_k 1.3577 262.6867 6.0281 4.4851

Per-dataset KL/token:

model CoLA summarization translation en-de translation en-fr
t5-small-f16 0.00000 0.00000 0.00000 0.00000
t5-small-q8_0 0.00029 0.00194 0.00191 0.00181
t5-small-q5_k_m 0.00544 0.01159 0.00923 0.00838
t5-small-q4_k_m 0.00811 0.02593 0.01732 0.01437
t5-small-q4_0 0.01239 0.07497 0.02886 0.02339
t5-small-q3_k_m 0.00539 0.07696 0.04827 0.04073
t5-small-q2_k 0.00350 0.36274 0.22476 0.18650

Interpretation:

  • The KL ranking is stable and clear: f16, q8_0, q5_k_m, q4_k_m, q4_0, q3_k_m, then q2_k.
  • q8_0 has very small distributional drift from f32.
  • q5_k_m is the strongest compact quantization in this run.
  • q4_k_m is materially better than q4_0 by KL/token and top-1 disagreement.
  • q2_k has high drift and large top-1 disagreement on generation-heavy datasets.

Recommended Default

For T5-small in this workflow:

  • Use t5-small-f32.gguf as the reference baseline.
  • Use t5-small-q8_0.gguf when preserving behavior matters most.
  • Use t5-small-q5_k_m.gguf as the best compact default from this run.
  • Use t5-small-q4_k_m.gguf only when size pressure is stronger than quality.
  • Avoid t5-small-q2_k.gguf for summarization or translation quality checks.

GOOGLE T5-small License: Apache 2.0 We followed and adopted their licnese.

Downloads last month
280
GGUF
Model size
60.5M params
Architecture
t5
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support