Instructions to use PeytonT/f5tts-4bit-4step-v327 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use PeytonT/f5tts-4bit-4step-v327 with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
F5-TTS 4-bit 4-step Distill v327
This is a weights-only custom runtime bundle for the Agent Kernel Lite Peyton voice F5-TTS 4-step student checkpoint.
It keeps the F5-TTS CFM DiT architecture and packs the large tensors as rowwise
signed int4 with fp16 scales. Loading requires an F5-TTS-compatible runtime that
understands f5tts-q4-bundle-v0, plus a compatible Vocos vocoder.
Candidate
- Model id:
20260605-peyton-q4-4step-v327 - Source checkpoint:
runs/checkpoints/f5tts_q4_4step_v327_teacher24nfe4_row3_from_v323/model_q4_4to4_best_rollout.pt - Recommended generation: 4 NFE steps, CFG around
0.45-0.55, 24 kHz audio - Q4 parameters:
335,472,640 - Dense fp16 parameters:
1,624,196 - Tensor payload:
171,601,360bytes, about163.65 MiB
Evaluation Snapshot
Retained broad selected v327 4-step eval:
- WER:
0.1264964568 - phonetic WER:
0.0934223184 - WavLM profile mean:
0.8814315548 - repetition flagged outputs:
0 - mean worst-segment high-band ratio:
0.1581710380 - max worst-segment high-band ratio:
0.3323809601
Static-aware soft reselection from the same pool:
- WER:
0.1334409012 - phonetic WER:
0.1003667629 - WavLM profile mean:
0.8761436492 - mean worst-segment high-band ratio:
0.1135852739 - max worst-segment high-band ratio:
0.2383649655
The retained checkpoint is v327. Later row-only probes are not promoted:
- v339: row-only normalized stress sample, not a retained checkpoint
- v340: row-only Vocos pronunciation probe, not a retained checkpoint
- v341: rejected; high static and semantic failure on Vocos/WebGPU/WASM
Files
manifest.json: bundle metadata and architecture descriptionexport_summary.json: tensor counts and byte sizestensors.q4.bin: packed int4 tensor payloadtensor_q4_index.json: index for packed int4 tensorstensors.fp16.bin: fp16 tensor payloadtensor_fp16_index.json: index for fp16 tensorsF5TTS_Base_vocab.txt: F5-TTS vocabularypeyton_voice_q4_4step_v327.tar: app-ready voice archivesamples/v327_4step_row3_best_current.wav: current selected row3 samplesamples/v327_4step_row3_static_aware_soft.wav: lower-static row3 sample
Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.