F5-TTS 4-bit 4-step Distill v327

This is a weights-only custom runtime bundle for the Agent Kernel Lite Peyton voice F5-TTS 4-step student checkpoint.

It keeps the F5-TTS CFM DiT architecture and packs the large tensors as rowwise signed int4 with fp16 scales. Loading requires an F5-TTS-compatible runtime that understands f5tts-q4-bundle-v0, plus a compatible Vocos vocoder.

Candidate

  • Model id: 20260605-peyton-q4-4step-v327
  • Source checkpoint: runs/checkpoints/f5tts_q4_4step_v327_teacher24nfe4_row3_from_v323/model_q4_4to4_best_rollout.pt
  • Recommended generation: 4 NFE steps, CFG around 0.45-0.55, 24 kHz audio
  • Q4 parameters: 335,472,640
  • Dense fp16 parameters: 1,624,196
  • Tensor payload: 171,601,360 bytes, about 163.65 MiB

Evaluation Snapshot

Retained broad selected v327 4-step eval:

  • WER: 0.1264964568
  • phonetic WER: 0.0934223184
  • WavLM profile mean: 0.8814315548
  • repetition flagged outputs: 0
  • mean worst-segment high-band ratio: 0.1581710380
  • max worst-segment high-band ratio: 0.3323809601

Static-aware soft reselection from the same pool:

  • WER: 0.1334409012
  • phonetic WER: 0.1003667629
  • WavLM profile mean: 0.8761436492
  • mean worst-segment high-band ratio: 0.1135852739
  • max worst-segment high-band ratio: 0.2383649655

The retained checkpoint is v327. Later row-only probes are not promoted:

  • v339: row-only normalized stress sample, not a retained checkpoint
  • v340: row-only Vocos pronunciation probe, not a retained checkpoint
  • v341: rejected; high static and semantic failure on Vocos/WebGPU/WASM

Files

  • manifest.json: bundle metadata and architecture description
  • export_summary.json: tensor counts and byte sizes
  • tensors.q4.bin: packed int4 tensor payload
  • tensor_q4_index.json: index for packed int4 tensors
  • tensors.fp16.bin: fp16 tensor payload
  • tensor_fp16_index.json: index for fp16 tensors
  • F5TTS_Base_vocab.txt: F5-TTS vocabulary
  • peyton_voice_q4_4step_v327.tar: app-ready voice archive
  • samples/v327_4step_row3_best_current.wav: current selected row3 sample
  • samples/v327_4step_row3_static_aware_soft.wav: lower-static row3 sample

Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support