tritllm-codec / KNOWN_ISSUES.md
Entrit's picture
fix: address codex review BLOCKERs and SHOULD-FIXes; update KNOWN_ISSUES
6c2b514 verified

Known limitations — tritllm-codec

Items previously raised in code review have been addressed in the current release. This document only lists deliberate design tradeoffs that the codec review surfaced, not bugs.

Design tradeoffs

Scale codebook upper bound = max(group_abs_maxes)

Where: quantize_model_v2.py, trit_quantize_scales, log_max = np.max(...)

The 27-entry log-spaced scale codebook spans [log_min, log_max] where log_max is taken to be the maximum group magnitude in the matrix. This is intentional — an earlier 99.9th-percentile bound (commit prior to 0c16d24) clipped large-scale outlier groups and lost their resolution.

The downside: a single extreme-scale outlier group can stretch the log-spaced range and reduce scale resolution for the bulk of normal-magnitude groups in the same matrix.

We do not see this cause measurable quality regressions on Qwen2.5, Llama-3.1, or Mistral-7B. If you observe unexpectedly high PPL on a new model family with heavy-tailed scale distributions, this is the first place to look.

We did not change this in the current release because changing it would alter the bit-exact output of the codec and invalidate published paper numbers; a future v3 may replace np.max with a soft-cap (e.g. min(max, 4 * p99)) that is robust to single extreme outliers without giving up large-scale fidelity.

Scale candidate set is fixed at 4 percentiles

Where: quantize_model_v2.py, compute_best_scale_4cand

The MSE-best scale is selected from four fixed order statistics — indices [gs-6, gs-4, gs-2, gs-1] of sorted |w|. This is a deliberate compute / quality tradeoff (≈50× speedup over an exhaustive sweep, <1% PPL gap measured on Qwen2.5-7B), not a bug. The function name and docstring now reflect this.