F5-TTS 4-bit 4-step Distill v327

This is a weights-only custom runtime bundle for the Agent Kernel Lite Peyton voice F5-TTS 4-step student checkpoint.

It keeps the F5-TTS CFM DiT architecture and packs the large tensors as rowwise signed int4 with fp16 scales. Loading requires an F5-TTS-compatible runtime that understands f5tts-q4-bundle-v0, plus a compatible Vocos vocoder.

Candidate

Model id: 20260605-peyton-q4-4step-v327
Source checkpoint: runs/checkpoints/f5tts_q4_4step_v327_teacher24nfe4_row3_from_v323/model_q4_4to4_best_rollout.pt
Recommended generation: 4 NFE steps, CFG around 0.45-0.55, 24 kHz audio
Q4 parameters: 335,472,640
Dense fp16 parameters: 1,624,196
Tensor payload: 171,601,360 bytes, about 163.65 MiB

Evaluation Snapshot

Retained broad selected v327 4-step eval:

WER: 0.1264964568
phonetic WER: 0.0934223184
WavLM profile mean: 0.8814315548
repetition flagged outputs: 0
mean worst-segment high-band ratio: 0.1581710380
max worst-segment high-band ratio: 0.3323809601

Static-aware soft reselection from the same pool:

WER: 0.1334409012
phonetic WER: 0.1003667629
WavLM profile mean: 0.8761436492
mean worst-segment high-band ratio: 0.1135852739
max worst-segment high-band ratio: 0.2383649655

The retained checkpoint is v327. Later row-only probes are not promoted:

v339: row-only normalized stress sample, not a retained checkpoint
v340: row-only Vocos pronunciation probe, not a retained checkpoint
v341: rejected; high static and semantic failure on Vocos/WebGPU/WASM

Files

manifest.json: bundle metadata and architecture description
export_summary.json: tensor counts and byte sizes
tensors.q4.bin: packed int4 tensor payload
tensor_q4_index.json: index for packed int4 tensors
tensors.fp16.bin: fp16 tensor payload
tensor_fp16_index.json: index for fp16 tensors
F5TTS_Base_vocab.txt: F5-TTS vocabulary
peyton_voice_q4_4step_v327.tar: app-ready voice archive
samples/v327_4step_row3_best_current.wav: current selected row3 sample
samples/v327_4step_row3_static_aware_soft.wav: lower-static row3 sample

Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.

Downloads last month: -; Downloads are not tracked for this model. How to track